Effective Use of Graphs Annie Herbert Medical Statistician Research & Development Support Unit Salford Royal (Hope) Hospitals NHS Foundation Trust annie.herbert@manchester.ac.uk (0161 720) 2227 Timetable Time Task 60 mins Presentation 20 mins Coffee Break 90 mins Practical Tasks in IT Room Outline • Graphs for categorical data • Graphs for numerical data • Comparing groups • Additional graphs (covered in other courses) • Final tips & Computer packages Categorical Data (1) Examples: • Sex – Male/Female • Blood Group – A/B/AB/O • Employment Status – Unemployed/Part-time/Full-time Categorical Data (2) • Record: Frequency (discrete number) per category • Summary: Frequency OR percentage/fraction/proportion • Visually: Official Employment Status of Population of Camberwick Green Official Employment Status of Population of Camberwick Green 2500 Frequency 2000 1500 Unemployed 1000 Part-time Full-time 500 0 Unemployed Part-time Full-time Employment Type - Bar Chart - Pie Chart Example – Discharge Destination (1) Where Patient Lives n = 731 Alone 339 (46.3%) Family 210 (28.7%) Home 180 (24.6%) Other 2 (0.3%) Example – Discharge Destination (2) Frequency Discharge Destinations of Patients 400 350 300 250 200 150 100 50 0 Alone Family Home Discharge Destination Other Example – Psychiatric Illness/ Discharge Destination (1) Psychiatric Illness? Where Patient Lives No n=208 Yes n=523 Alone 117 (56%) 222 (42%) Family 81 (39%) 129 (25%) Home 9 (4%) 171 (33%) Other 1 (0%) 1 (0%) Example – Psychiatric Illness/ Discharge Destination (2) Where Patient Lives Psychiatric Illness? Alone (n=339) Family (n=210) Home (n=180) Other (n=2) No 117 (35%) 81 (39%) 9 (5%) 1 (50%) Yes 222 (65%) 129 (61%) 171 (95%) 1 (50%) Example – Psychiatric Illness/ Discharge Destination Bar Chart Discharge Destination of Patients with and without Psychiatric Illness 250 Frequency 200 150 No Yes 100 50 0 Alone Family Home Discharge Destination Other Stacked Bar Chart Discharge Destination of Patients with and without Psychiatric Illness 100% 90% 80% Percentage 70% 60% Yes 50% No 40% 30% 20% 10% 0% Alone Family Home Discharge Destination Other Re-ordering categories can emphasize a certain effect: Discharge Destination of Patients with and without Psychiatric Illness Discharge Destination of Patients with and without Psychiatric Illness 100% 100% 90% 90% 80% 80% 70% Other 60% Home 50% Family 40% Alone Percentage Percentage 70% Family 60% Other 50% Home 40% 30% 30% 20% 20% 10% 10% Alone 0% 0% No Yes Psychiatric Illness? No Yes Psychiatric Illness? The axis should always start from 0: Discharge Destination (Alone, Family) for Patients with and without Psychiatric Illness Discharge Destination (Alone, Family) for Patients with and without Psychiatric Illness 250 230 210 190 150 Alone Family 100 Frequency Frequency 200 170 Alone 150 Family 130 110 50 90 0 70 No Yes Psychiatric Illness? No Yes Psychiatric Illness? Bar Charts – Adv & Disadv • Advantages: - Visually strong. - Easy to compare between more than one dataset. • Disadvantages: - Categories can be ‘re-ordered’ to emphasize certain effects. - Misleading if not used for counts. - Misleading if y-axis not from 0. Bar Charts – Things to consider: • What group differences are you interested in? • Frequencies or percentages? If percentage, it’s down to you to specify the totals. • Is ‘Other’ a large frequency/percentage? • Consider the categories as un-ordered when using a stacked bar chart. Pie Charts Psychiatric Illness? No Psychiatric Illness? Yes Home Home Alone Alone Family Family Other Other Pie Charts – Advantages: • Easy to compare categories, are equidistant from each other. • Ordering of categories does not emphasize certain effects as badly as stacked bar charts do. Pie Charts – Disadvantages: • No choice between frequencies and percentages (down to you to specify totals). • Cannot put more than one data set into a pie chart. • Lose individual values of the data. • Limited space: if using more than 5 or 6 categories, chart can look complicated. Numerical Data (1) Examples: • Weight • Blood Pressure • Cholesterol Levels Numerical Data (2) • Record: Number/Value (discrete or continuous) • Summary: - Mean (SD) - Median (IQR) • Visually: - Histogram - Box plot - Spread plot Age 48 36 56 66 65 19 36 59 48 Data – Ages of Patients in Selenium Study 52 67 39 28 58 48 49 39 57 62 74 59 66 45 69 55 63 42 68 54 24 19 70 73 29 34 50 Histogram – Ages of Patients in Selenium Study Histograms for the same data can vary: Compromise: Beware! Histogram is not Bar Chart 400 400 300 300 200 200 100 Count 100 0 <5 0 10 30 50 70 90 110 130 150 170 190 Length of stay (day s) 7to14 5to7 31to60 15to30 Length of stay (day s) 61to120 >120 Histograms – Advantages: • Visual display of interval frequencies, easy to compare intervals. • Can give an idea of the distribution of the data, e.g. shape, typical value, spread. Histograms – Disadvantages: • Choice of interval width can alter appearance. • Individual values lost. • One data set per histogram, difficult to compare data sets. Box Plot Extreme Outlier Median Upper Quartile Lower Quartile Outlier Box Plots – Advantages: • Defines many summary statistics in one plot. • Defines ‘outliers’ explicitly. • Can have more than one data set in a plot, so easy to compare data sets: Box Plots – Disadvantages: • More complicated visually than some other types of data plots. • Individual values lost. Spread Plots (1) Spread Plots (2) • Advantages: - Can give an idea of the distribution of the data, e.g. shape, typical value, spread. - Shows individual values of the data. - Can show more than one dataset in a plot. • Disadvantages: - Not very widely used in journal publications. - Doesn’t explicitly summarise statistics or outliers as box plot does. Relationships in Numerical Data Serial Measurements Mean TG (±standard error) at each time point Change of TG over time 3 3 E 1.1 E 2.2 E 3.2 E 4.1 2 2 Mean TG (mM) E 5.1 TG (mM) E 6.1 1 0 -100 150 400 650 Time (minutes) 1 0 -100 150 400 650 Time (minutes) What information does this give? Mean ± SE, n ≈ 30 per group Better to look at individual data… …or give a sensible summary. Kaplan-Meier Curve (step graph) Time-to-Event data. Survival Plot (PL estimates) 1.00 1 0 0.75 Survivor 0.50 0.25 0.00 0 100 200 300 Times Bland-Altman Plots (scatter plots) How well do two methods of measurement agree? Agreement Plot (95% limits of agreement) 100 difference 50 0 -50 -100 200 250 300 350 400 450 mean Forest Plots (Hi-Lo-Close charts) Meta-Analysis. Forest (meta-analysis) plot AH 0.72 (0.48, 1.08) SW 0.68 (0.45, 1.03) MT 0.80 (0.60, 1.07) KW 0.80 (0.47, 1.36) Pooled 0.75 (0.50, 1.14) 0.2 0.5 1 2 Final Pointers: • Before plotting think about the type of data and what you would like to compare. • Show all data rather than summaries where possible. • Label axes clearly. Graph should ‘stand alone’. • Make sure when comparing groups that outcome on the same scale. • Make sure any colours used are sufficiently different from each other, and not red/green. Using a Computer Package: Package Advantages Disadvantages SPSS Produces journal quality graphs • Difficult to start with • Expensive Difficult to draw bar/pie charts StatsDirect When copied and pasted, these graphs may be edited in Word Excel Easy to use for bar/pie charts Not a statistics package