Practice: Data Exploration Topic 1: Reading Histograms & Box Plots SUPPORT Study: Background SUPPORT = Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatment 4301 hospitalized patients from 5 different academic hospitals enrolled for phase 1 of the study; 4028 patients were enrolled for phase 2. Overall goal was to develop a model that would estimate the overall survival for patients based on their diagnosis and severity of illness. ** Main article for this study is cited on last slide and is posted in Collab (Resources > Data Sets > SUPPORT Study) for those interested in reading about this study Histograms & Boxplots - How to Describe For Histograms, we look at: 1. Distribution/Shape • Symmetric or Skewed • How many peaks? (Unimodal, bimodal, or multimodal) • Overall distribution (normal, uniform, etc) 2. Center • Median, mean 3. Variation/Spread • Observe min/max, large or small amount of variation (standard deviation) 4. Potential Outliers? For Boxplots, we look at: 1. 5 number summary • The five main values used to build the box plot 2. Shape • Symmetric or skewed (look at the lengths of the whiskers) 3. Outliers • Check with the outlier rule; R automatically checks for outliers and will use points to show outliers (if their are any) White Blood Cell Counts - Exploration Descriptive Statistics for White Blood Cell Count Min 0.050 Q1 6.899 Median 10.450 Q3 15.500 Max 100.000 Mean 12.400 Std Dev 8.9525 n = 1000 24 missing observations Respiratory Rate - Exploration Descriptive Statistics for Respiratory Rate Min 0.00 Q1 18.00 Median 24.00 Q3 29.00 Max 64.00 Mean 23.49 Std Dev 9.2557 n = 1000 No missing observations Data obtained from: http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets Variable Information: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/Csupport.html Article: Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191203.