Statistics 101: Formula-Free Statistics Author: Nick Barrowman, PhD Date: July 16th, 2012 Conflict of interest disclosure • I do not hold any research grants funded by industry • I have done paid consulting work for Mead Johnson Nutrition [Canada] Co. • I have no other relevant financial relationships with members of the pharmaceutical industry or medical supply companies. 2 Learning objectives By the end of this talk you will be able to: • Understand the principal concepts of statistics • Interpret statistics commonly reported in the medical literature • Identify "red flags" in research reports that may be signs of trouble Variability • Patients vary. • Physicians vary. • Nurses vary. • Hospitals vary. • Measurements vary. • Disease states vary. • Immune response varies. • Drug adherence varies. … Variability Consequences of variability • Variability means that the patterns we notice may be illusory and we may miss the real patterns. • Variability means that our conclusions will always be tentative. • Variability means that we’ll have to deal with uncertainty. “Uncertainty is an uncomfortable position. But certainty is an absurd one.” - Voltaire What is Statistics? • Statistics is the science of variability. The principal concepts of statistics • Variability can be modeled using probability. • This provides a framework for drawing inferences and quantifying our uncertainty. Fundamental statistical ideas • The population and the sample • Confidence intervals • Hypothesis tests • P-values Population vs. sample Random sample Population Calculation Population mean blood pressure Inference Sample mean blood pressure A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no difference in systolic blood pressure between the groups.” A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no difference in systolic blood pressure between the groups.” Statements like this can be perplexing. For a start, how can there be no difference when there is clearly a difference? Population vs. sample High BP Random sample Population Low BP Calculation Population difference between groups in mean blood pressure Inference Sample difference between groups in mean blood pressure A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no difference in systolic blood pressure between the groups.” A typically cryptic description Sample “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no difference in systolic blood pressure between the groups.” Population A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no difference in systolic blood pressure between the groups.” statistically significant More on this later … A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no statistically significant difference in systolic blood pressure between the groups.” mean Even if the means do not differ significantly between groups, systolic blood pressure varies within each group. A typically cryptic description “The mean systolic blood pressure in group A was 110 mmHg, while in group B it was 104 mmHg … There was no statistically significant difference in mean systolic blood pressure between the groups.” (SD = 4.2 mmHg) SD is the standard deviation. Estimates of variability are essential. (SD = 5.0 mmHg) Red flag Failing to report estimates of variability Variability is always present. Failing to report estimates of variability can be misleading and can also make it impossible for the reader to verify results. Comparisons • Many studies focus on comparisons • between groups • between a single group and a reference standard. • e.g. Compare weight gain: • On average, did group A gain more than group B? • On average, did people in a single group gain weight? Here the reference standard is no change. The null hypothesis • The null hypothesis is a default assumption about the population, usually that there is no difference. • For example: In the population, there is no difference between the mean blood pressure for groups A and B. Weight gain example • With two groups, the null hypothesis is: • Mean weight gain is the same in the two groups • i.e. Difference in mean weight gain = 0 • With one group, the null hypothesis is: • Mean weight gain = 0 • In these two cases, zero is the “null value” Other examples of the null hypothesis • Example: Mortality in two groups • Mortality rate in group A = mortality rate in group B • i.e. Relative risk of mortality is 1. • So 1 is the null value. • Example: IQ in a single group • Mean IQ is 100. • So 100 is the null value. Hypothesis testing An example: Bedside Limited Echocardiography by the Emergency Physician Is Accurate During Evaluation of the Critically Ill Patient Pershad et al. Pediatrics 2004;114;e667-e671. Goal: to compare echocardiography measurements made by emergency physicians and experienced pediatric echocardiography providers. Hypothesis testing Patient Emerg Doc Cardiographer Shortening fraction SF (%) Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 4.4%. Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 4.4%. Could this difference be a chance occurrence? Hypothesis testing • We need to test the hypothesis that in the population there is no difference. • We often report a p-value: the probability of observing a difference that is at least as extreme as what was observed, assuming there is no difference in the population. • Usually consider a p-value < 0.05 to be statistically significant. Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 4.4%. P=0.003 (t-test) statistically significant Statistical vs. Clinical significance • But the authors note, “Although statistically significant, the difference of 4.4% in the estimation of SF may not be clinically relevant.” • A statistically significant finding is not always clinically significant. • Subject area judgement is always needed. Red flag Treating a statistically significant finding as important without considering whether it is clinically relevant Statistical significance simply rules out chance as an explanation for the results. It does not necessarily mean that the results are clinically significant. Beyond the p-value • The mean sample difference is 4.4% • The mean population difference could be larger or smaller. • We know (with 95% confidence) that the population difference is greater than zero. Do we know anything more? Confidence intervals • Yes! A much more useful result than a pvalue is a confidence interval. • A confidence interval tells us what population values (of the difference in means) are consistent with our data. • Values outside the confidence interval are ruled out. • The key issue is the clinical relevance of the values contained in a confidence interval. Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 4.4%. The 95% confidence interval is 1.6% – 7.2%. “Statistics means never having to say you’re certain.” Hypothesis testing Patient Emerg Doc Cardiographer Inferior vena cava (IVC) diameter(mm) Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 0.068 mm P=0.14 Not statistically significant Hypothesis testing • Since p>0.05, can we conclude that there is no difference between IVC measurements made by echocardiographers and emergency physicians? Hypothesis testing • Since p>0.05, can we conclude that there is no difference between IVC measurements made by echocardiographers and emergency physicians? • No! The confidence interval can help us understand why not. Hypothesis testing Difference between measurements made by echocardiographers and emergency physicians On average echocardiographers’ estimates were higher by 0.068 mm The 95% confidence interval is -0.025 to 0.16 mm. Is the null hypothesis true? • It is not necessary (or even possible) to know if the null hypothesis is exactly correct. • For instance, it is not necessary to know if a difference is exactly zero. • Rather, it is sufficient to be confident that a difference is small enough that it is negligible or unimportant. • Who should decide what is unimportant? Red flag Claiming there is no difference because p > 0.05 The sample size of the study may not be sufficient to detect a difference, or the difference may be small (but possibly still important). Association • The example examined the question of whether type of training (emergency physician or echocardiographer) was associated with the measured value of SF or IVC. • Many common statistical methods focus on estimating or testing associations. Measures and tests of association • Student’s t-test, Wilcoxon test … Measures and tests of association • Student’s t-test, Wilcoxon test • chi-square test, Fisher’s exact test … Measures and tests of association • Student’s t-test, Wilcoxon test • chi-square test, Fisher’s exact test • log rank test … Measures and tests of association • Student’s t-test, Wilcoxon test • chi-square test, Fisher’s exact test • log rank test • Pearson and Spearman correlations … Measures and tests of association • Student’s t-test, Wilcoxon test • chi-square test, Fisher’s exact test • log rank test • Pearson and Spearman correlations • absolute risk reduction • relative risk • odds ratio • number needed to treat … Measures of association for dichotomous outcomes Group Group A B 50% 5% 25% 30% 3% 5% Absolute risk reduction Relative risk Odds ratio 20% 0.6 0.43 (NNT=5) (RRR=40%) 2% 0.6 (NNT=50) (RRR=40%) 20% 0.2 (NNT=5) (RRR=80%) 0.59 0.16 The slippery slope of causation • When there is an association, we may be tempted to ascribe causation. For example: • “The new surgical technique produced quicker recovery.” • Compare with: • “Patients who received the new surgical technique recovered quicker.” • Perhaps the patients who received the new surgical technique had less serious conditions. The slippery slope of causation • Be very cautious with conclusions about causality • A well-executed randomized trial provides the most solid grounds for causal inferences “Correlation does not imply causation.” Red flag Causal inferences in an observational study An observational study can detect an association, but not (by itself) causation. Raw and adjusted analyses • In the analysis of the surgery data, it might be wise to take into account the severity of each patient’s condition. • This would give “adjusted” results. • Terminology: Unadjusted results are often called “raw” or “crude” results. • Adjusted analyses are often used to account for imbalances between groups. Raw and adjusted analyses • But remember: an adjusted association is still just an association. • Remain cautious of causal inferences! Adjusted analyses • linear regression • logistic regression • Poisson regression • Cox proportional hazards regression • Cochran-Mantel-Haenszel methods … Some things I haven’t discussed • Biases • Study designs • Power and sample size determination • Other statistical methods • • • • evaluation of diagnostic tests meta-analysis multi-level/hierarchical models assessment of measurement reliability … Summary • Variability leads to uncertainty. • Statistics uses probability theory to model variability and quantify uncertainty. • Confidence intervals are more informative than pvalues, and help to assess clinical significance. • Many common statistical methods focus on estimating and testing associations. • Be careful about ascribing causation. • To account for other factors, statistical methods are available for adjusting associations. “Variety Variability is the spice of life.”