01 Assignment DA

advertisement
Descriptive Data Analytics
Pneumonia Length of Stay
Background: A local hospital system in the upper Midwest is comprised of 3 hospitals (1, 2, and
3) and is concerned about its performance with respect to caring for pneumonia patients.
Pneumonia is one of the conditions for which Medicare is penalizing low-performing hospitals,
defined as those that have higher than average risk-adjusted re-hospitalization rates.
As a starting point, the COO has asked that you to take a closer look at the hospital system’s
Medicare patient population with pneumonia and to examine what factors are associated with
length of stay. Discharge abstracts for the DRGs corresponding to pneumonia were extracted
from the health system’s databases.
There are 1444 observations in the data set. The analyst then constructed some variables from
the original data. The variables are listed below. The data are available on the course site in both
Excel format.
Task: Your goal is to conduct a statistical analysis and interpret your findings to address the following questions: (1) What is the distribution of length of stay for the hospital overall? How does the distribution of length of stay differ by hospital? Is the average LOS different by hospital? When examining the distribution of a continuous variable, be sure to examine central tendency, dispersion, shape and skew. The presence of outliers is important to note here as well. To examine whether average LOS differs, you will need to rely on testing methods you learned earlier. (2) What is the demographic (age, female, white, lowincome) and clinical profile ( number of diagnoses, diabetes, copd, chf) of pneumonia patients by hospital? Are there significant differences by hospital with respect to the demographic and clinical profile of patients? If so, in what ways and by how much do they differ? To respond to these questions, you will need to rely on testing methods that you acquired earlier. You may wish to summarize your findings in a table and then provide additional interpretation of your results. (3) What are the patient demographic, clinical, and hospital operations-­related factors that are associated with a patient’s length of stay? To address this question, please estimate and fully interpret the following multiple linear regression model. 𝐿𝑂𝑆 = 𝛽0 + 𝛽1 𝐴𝐺𝐸 + 𝛽2 πΉπ‘’π‘šπ‘Žπ‘™π‘’ + 𝛽3 π‘Šβ„Žπ‘–π‘‘π‘’ + 𝛽4 πΏπ‘œπ‘€π‘–π‘›π‘π‘œπ‘šπ‘’ + 𝛽5 𝐢𝑂𝑃𝐷
+ 𝛽6 𝐢𝐻𝐹 + 𝛽7 π·π‘–π‘Žπ‘π‘’π‘‘π‘’π‘  + 𝛽8 𝑁𝐷𝑋 + 𝛽9 π»π‘œπ‘ π‘1 + 𝛽10 π»π‘œπ‘ π‘2 + πœ€ Here, the betas are parameters you will estimate for each of the variables listed in parentheses. The error term (epsilon) captures the unobserved factors that influence LOS, but are not captured by the model. Please be sure to assess the overall goodness of fit; identify which factors in the model exhibit a statistically significant relationship with length of stay, and indicate the direction and magnitude of any statistically significant associations. (4) What are the key findings based on the analysis above? What additional variables or analyses might you want to recommend as part of future analysis on the topic? What do the results tell you about the performance across the hospitals? What might the hospital want to investigate further, either in terms of variables or analyses that would contribute to its understanding in variation in performance related to pneumonia? Deliverable: The deliverable is a memorandum that addresses the questions above using the following
structure (suggested page allocations are in parentheses).
Background information and analysis objective (.5-­β€1 page) Analyses corresponding to questions 1, 2, and 3 (2-­β€4 pages) Summary and managerial implications corresponding to question 4 (1 page) Appendix containing labeled tables and Excel output in a Word document (up to 15 pages maximum) Miscellaneous pointers: •
•
•
•
Please choose a particular level of significance for all of your hypothesis tests (e.g., α=.05) or report p-­β€values. In your write-­β€up, please be clear in your documentation of the test statistic and critical value or the p-­β€value. While you don’t need to explicitly state your null and alternative hypotheses for each test that you conduct, please be sure it is clear in your conclusion or inference as to what your result means. Please be sure to provide labels any tables, graphs, or output (e.g., Table 1, Figure 1, etc) in your Appendix. Include complete references for any cited articles (either as footnotes or endnotes). Data Dictionary:
Variable Name
Key
LOS
Age
Female
White
Lowincome
COPD
CHF
Definition
Unique patient identifier
length of stay (days)
Patient’s age
1: patient is female, 0 if male
1: white, 0 if non-white
1: patient resides in a low-income zip code (below median), 0 if
not
1: patient has COPD, 0 if not
1: patient has congestive heart failure, 0 if not
Diabetes
Ndx
Hosp1
Hosp2
Hospcat
1: patient has diabetes diagnosis, 0 if not
Total number of diagnoses reported on record
1: admitted to hospital 1, 0 if not
1: admitted to hospital 2, 0 if not
1: hospital 1; 2: hospital 2; 3: hospital 3
Final Notes:
This project must be done individually.
If you have questions about Excel or statistical concepts, you are more than welcome to ask the
instructor. I will be happy to assist you.
Download