Descriptive Data Analytics Pneumonia Length of Stay Background: A local hospital system in the upper Midwest is comprised of 3 hospitals (1, 2, and 3) and is concerned about its performance with respect to caring for pneumonia patients. Pneumonia is one of the conditions for which Medicare is penalizing low-performing hospitals, defined as those that have higher than average risk-adjusted re-hospitalization rates. As a starting point, the COO has asked that you to take a closer look at the hospital system’s Medicare patient population with pneumonia and to examine what factors are associated with length of stay. Discharge abstracts for the DRGs corresponding to pneumonia were extracted from the health system’s databases. There are 1444 observations in the data set. The analyst then constructed some variables from the original data. The variables are listed below. The data are available on the course site in both Excel format. Task: Your goal is to conduct a statistical analysis and interpret your findings to address the following questions: (1) What is the distribution of length of stay for the hospital overall? How does the distribution of length of stay differ by hospital? Is the average LOS different by hospital? When examining the distribution of a continuous variable, be sure to examine central tendency, dispersion, shape and skew. The presence of outliers is important to note here as well. To examine whether average LOS differs, you will need to rely on testing methods you learned earlier. (2) What is the demographic (age, female, white, lowincome) and clinical profile ( number of diagnoses, diabetes, copd, chf) of pneumonia patients by hospital? Are there significant differences by hospital with respect to the demographic and clinical profile of patients? If so, in what ways and by how much do they differ? To respond to these questions, you will need to rely on testing methods that you acquired earlier. You may wish to summarize your findings in a table and then provide additional interpretation of your results. (3) What are the patient demographic, clinical, and hospital operations-­related factors that are associated with a patient’s length of stay? To address this question, please estimate and fully interpret the following multiple linear regression model. πΏππ = π½0 + π½1 π΄πΊπΈ + π½2 πΉπππππ + π½3 πβππ‘π + π½4 πΏππ€ππππππ + π½5 πΆπππ· + π½6 πΆπ»πΉ + π½7 π·πππππ‘ππ + π½8 ππ·π + π½9 π»ππ π1 + π½10 π»ππ π2 + π Here, the betas are parameters you will estimate for each of the variables listed in parentheses. The error term (epsilon) captures the unobserved factors that influence LOS, but are not captured by the model. Please be sure to assess the overall goodness of fit; identify which factors in the model exhibit a statistically significant relationship with length of stay, and indicate the direction and magnitude of any statistically significant associations. (4) What are the key findings based on the analysis above? What additional variables or analyses might you want to recommend as part of future analysis on the topic? What do the results tell you about the performance across the hospitals? What might the hospital want to investigate further, either in terms of variables or analyses that would contribute to its understanding in variation in performance related to pneumonia? Deliverable: The deliverable is a memorandum that addresses the questions above using the following structure (suggested page allocations are in parentheses). Background information and analysis objective (.5-­β1 page) Analyses corresponding to questions 1, 2, and 3 (2-­β4 pages) Summary and managerial implications corresponding to question 4 (1 page) Appendix containing labeled tables and Excel output in a Word document (up to 15 pages maximum) Miscellaneous pointers: • • • • Please choose a particular level of significance for all of your hypothesis tests (e.g., α=.05) or report p-­βvalues. In your write-­βup, please be clear in your documentation of the test statistic and critical value or the p-­βvalue. While you don’t need to explicitly state your null and alternative hypotheses for each test that you conduct, please be sure it is clear in your conclusion or inference as to what your result means. Please be sure to provide labels any tables, graphs, or output (e.g., Table 1, Figure 1, etc) in your Appendix. Include complete references for any cited articles (either as footnotes or endnotes). Data Dictionary: Variable Name Key LOS Age Female White Lowincome COPD CHF Definition Unique patient identifier length of stay (days) Patient’s age 1: patient is female, 0 if male 1: white, 0 if non-white 1: patient resides in a low-income zip code (below median), 0 if not 1: patient has COPD, 0 if not 1: patient has congestive heart failure, 0 if not Diabetes Ndx Hosp1 Hosp2 Hospcat 1: patient has diabetes diagnosis, 0 if not Total number of diagnoses reported on record 1: admitted to hospital 1, 0 if not 1: admitted to hospital 2, 0 if not 1: hospital 1; 2: hospital 2; 3: hospital 3 Final Notes: This project must be done individually. If you have questions about Excel or statistical concepts, you are more than welcome to ask the instructor. I will be happy to assist you.