Supplemental Digital Content 2.doc Steps in initial variable selection

Supplemental Digital Content 2.doc Steps in initial variable selection: General: All models used the same 76 variables (as reported in Supplemental Digital Content 1). All models were developed on 50% of the dataset and tested on the remaining 50%. Models were generated using the IBM SPSS Modeler ver. 21. Generation of neural network (NN) models: We used the Multi Layer Perception (MLP) method to generate five different NN models. Each model differed in terms of the random seed number applied. Generation of decision trees: We generated three types of decision trees - Chi-squared Automatic Interaction Detection (CHAID), a classification system that uses chi-square tests to identify optimal cut-points; C 5.0 - a recursive classification system based on entropy rules; and Classification and Regression Tree (CART) - a recursive classification system based on impurity rules. Model testing: We tested the ability of each of the eight models (5 NN models and the three decision trees) to discriminate between patients with and without a readmission at various cut points – at the 5% and 10% highest risk. The following parameters were tested: positive predictive value (PPV = 'hit rate'), the percentage of people whom were actually readmitted among those 1 determined as high-risk for readmission, at each of the 5% and 10% cut points according to the various models; and Lift, the ratio between PPV and the average occurrence in the population. The PPV ranged between 29%-35% for the 10% highest risk and 37-42% for the 5% highest risk. The lift ranged between 1.8 – 2.8 for the 5% or 10% highest risks. Model selection: Of the eight models we chose those with a PPV of 30% or higher (for the 10% highest risk) and a Lift of 2.0 or above. Five models met these criteria: two of the neural network models (termed NN1 and NN2) and the three decision trees (CHAID, C 5.0, and CART). Variable selection: In each model we compared the 20 top ranking variables. This ranking is provided by the Modeler according to the contribution of each of the variables to each model. Variables that were ranked as the top 20 variables in three of the five models, were entered into the multivariate logistic regression model. 2

Supplemental Digital Content 2.doc Steps in initial variable selection

Related documents

Products

Support

Supplemental Digital Content 2.doc Steps in initial variable selection

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib