Insights into the black box of statistics

What do you think about a doctor who uses the wrong treatment, either wilfully or through ignorance, or who uses the right treatment wrongly (such as by giving the wrong dose of a drug)? Most people would agree that such behaviour is unprofessional, arguably unethical, and certainly unacceptable. Derived from: Altman DG. The Scandal of Poor Medical Research. BMJ, 1994; 308:283 What do you think about researchers who use the wrong techniques (either wilfully or in ignorance), use the right techniques wrongly, misinterpret their results, report their results selectively or draw unjustified conclusions? We should be appalled… but numerous studies of the medical literature have shown that all of the above phenomena are common. Derived from: Altman DG. The Scandal of Poor Medical Research. BMJ, 1994; 308:283 Understanding your results Research Talk 2015 Dr Emily Karahalios emily.karahalios@unimelb.edu.au Office for Research, Western Centre for Health Research & Education Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne Overview • Defining your research question – PICOS • Describing data • Understanding the results – Estimates reported in the literature – Interpreting 95% confidence intervals and pvalues ~ Statistical Inference Research question Participants / population • neonates Intervention / exposure • 14 day administration of antenatal corticosteroids Comparison • 7 day administration of antenatal corticosteroids Outcome • Neonatal mortality and neonatal morbidity Study design • RCT Research question Murphy et al. The Lancet, 2008; 372:2143-2151. Research question Participants / population • Neonates Intervention / exposure • 14 day administration of antenatal corticosteroids Comparison • 7 day administration of antenatal corticosteroids Outcome • Neonatal mortality and neonatal morbidity Study design • RCT Research question Participants / population • Women at high risk of preterm birth Intervention / exposure • 14 day administration of antenatal corticosteroids Comparison • 7 day administration of antenatal corticosteroids Outcome • Neonatal mortality and neonatal morbidity Study design • RCT Study designs The general idea… – Evaluate whether a risk factor (or preventative factor) increases (decreases) the risk of an outcome (e.g. disease, death, etc) time exposure outcome Overview • Defining your research question – PICOS • Describing data • Understanding the results – Estimates reported in the literature – Interpreting 95% confidence intervals and pvalues ~ Statistical Inference Study designs The general idea… – Evaluate whether a risk factor (or preventative factor) increases (decreases) the risk of an outcome (e.g. disease, death, etc) time exposure outcome Summarising the data Murphy et al. The Lancet, 2008; 372:2143-2151. Summarising the data Dreyfus et al. Journal of Pediatrics, 2015 online. Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Continuous (age, weight, height) Numerical Discrete (length of stay, # of hospital visits) Nominal (sex, blood group) Categorical Ordinal (tumour stage, quintile of SES) Summarising the data • Which variables are categorical? – Sex (Male/Female) – Country of birth (Australia/Elsewhere) • Which variables are continuous? – Age (years) – Length of stay (days) Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand .1 0 .05 Density .15 .2 Summarising the data 44 46 Stata command: histogram Age 48 50 Age (years) 52 54 Summarising the data Mean = 49.8 years Standard deviation n å(x - x) 2 i = i=1 (n -1) = 2.1 years Note, 95% of observations lie within approximately ±2×SD of the mean In this example, 95% of observations lie within 45.6 and 54.0 years. Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {4.0} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand .1 0 .05 Density .15 .2 Summarising the data 0 5 Stata command: hist LOS 10 15 20 Length of stay (days) 25 30 35 Summarising the data Stata command: hist LOS, normal Summarising the data Mean = 5 days Summarising the data Mean = 5 days Median = 50th percentile = 4 days Summarising the data Mean = 5 days Median = 4 days Mean is not a good measure of central tendency and standard deviation is not a good deviation measures of spread forStandard a skewed distribution Note, 95% of observations lie within approximately ±2SD of the mean. In this example, 95% of observations lie within -4.8 and 14.8 days BUT they don’t because LOS can’t be negative! Summarising the data Median = 50th percentile = 4 days Inter-quartile range (IQR) = lower quartile – upper quartile = 25th percentile – 75th percentile = 2 to 6 days Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {2, 6} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Table 1: Baseline characteristics for 262 patients Number (%) or mean [SD] or median {IQR} Sex Males Females Age (years) Country of birth Australia/NZ Elsewhere Length of stay (days) 128 (51.2) 134 (48.9) 49.8 [2.1] 105 (40.1) 157 (59.9) 4 {2, 6} Abbreviations: IQR = Inter-quartile range, SD = standard deviation, NZ = New Zealand Summarising the data Central tendency Central tendency Spread Spread Summarising the data Positive skew Negative skew Summarising the data Data variable - numerical Plot histogram NOT normally distributed Normally distributed Unimodal Mean Standard deviation Minimum-maximum Median Inter-quartile range Minimum-maximum Multimodal Categorise variable Simpson et al. J Fam Plan and Rep Health Care, 2001; 27:234-236. Summarising the data Numerical Continuous (age, weight, height) Normally distributed Skewed Discrete (length of stay, # of hospital visits) Categorical Nominal (sex, blood group) Ordinal (tumour stage, quintile of SES) Absolutely critical to choosing the appropriate form of statistical analysis Overview • Defining your research question – PICOS • Describing data • Understanding the results – Estimates reported in the literature – Interpreting 95% confidence intervals and pvalues ~ Statistical Inference Study designs The general idea… – Evaluate whether a risk factor (or preventative factor) increases (decreases) the risk of an outcome (e.g. disease, death, etc) time exposure outcome Estimates reported in the literature – Risk differences – Odds ratios / risk ratio – logistic regression – Beta-coefficients – linear regression Summarising the data Numerical Continuous (age, weight, height) Normally distributed Skewed Discrete (length of stay, # of hospital visits) Categorical Nominal (sex, blood group) Ordinal (tumour stage, quintile of SES) Measures of association – binary outcome Binary variables – two categories only (also termed – dichotomous variable) Examples: • Outcome – diseased or healthy; alive or dead • Exposure – male or female; smoker or non-smoker; treatment or control group Comparing two proportions With outcome (diseased) Without outcome (disease free) Total Exposed (group 1) d1 h1 n1 Unexposed (group 0) d0 h0 n0 Total d h n • Proportion of all subjects experiencing outcome, p = d/n • Proportion of exposed group, p1 = d1/n1 • Proportion of unexposed group, p0 = d0/n0 Comparing two proportions - TBM Trial Adults with tuberculous meningitis randomly allocated into 2 treatment groups: 1. Dexamethasone 2. Placebo Outcome measure: Death during 9 months following start of treatment. Research question: Can treatment with dexamethasone reduce the risk of death among adults with tuberculous meningitis? Thwaites et al 2004 Comparing two proportions Death during 9 months post start of treatment Treatment group Yes No Total Dexamethasone (group 1) 87 187 274 Placebo (group 0) 112 159 271 Total 199 346 545 Thwaites et al 2004 Comparing two proportions - TBM Trial Measure of effect Formula Risk difference Risk Ratio (RR) Odds Ratio (OR) p1-p0 p1/p0 (d1/h1)/(d0/h0) When there is no association between exposure and outcome: – Risk difference = 0 – Risk ratio (RR) = 1 – Odds Ratio (OR) = 1 Comparing two proportions Death during 9 months post start of treatment Treatment group Yes No Total Dexamethasone (group 1) 87 (d1) 187 (h1) 274 (n1) Placebo (group 0) 112 (d0) 159 (h0) 271 (n0) 199 346 545 Total Risk difference = p1-p0 = (87/274)-(112/271) = -0.095 Risk ratio = p1/p0 = (87/274)/(112/271) = 0.77 Odds ratio = (d1/h1)/(d0/h0) = (87/187)/(112/159) = 0.66 Thwaites et al 2004 Comparing two proportions - TBM Trial Estimates reported in the literature – Risk differences – Odds ratios / risk ratio – logistic regression – Beta-coefficients – linear regression Summarising the data Numerical Continuous (age, weight, height) Normally distributed Skewed Discrete (length of stay, # of hospital visits) Categorical Nominal (sex, blood group) Ordinal (tumour stage, quintile of SES) Linear regression Dreyfus et al. Journal of Pediatrics, 2015 online. Linear regression There are four assumptions underlying our linear regression model: Linearity (outcome and exposure) Normality (residual variation) Independence (of observations) Homoscedasticity (constant variance) Overview • Defining your research question – PICOS • Describing data • Understanding the results – Estimates reported in the literature – Interpreting 95% confidence intervals and pvalues ~ Statistical Inference Statistical Inference Statistical Inference We follow a standard four-step process 1) 2) 3) 4) Sample size Estimate of the effect size Calculate a confidence interval Derive a p-value to test the hypothesis of no association Statistical Inference Definition of a confidence interval REMEMBER! .. If we were to draw several independent, random samples (of equal size) from the sample population and calculate 95% confidence intervals for each of them, 0.4 0.35 then on average 19 out of every 20 (95%) such confidence intervals would contain the true population proportion (! ), and one of every 20 (5%) would not. Sample proportion and 95% CI 0.3 Population proportion = 0.16 (16%) 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 Sam ple 12 13 14 15 16 17 18 19 20 Statistical Inference P-value How likely is it we would see a difference this big What is the probability (P-value) of finding the observed difference IF IF There was NO real difference between the populations? The null hypothesis is true? Statistical Inference Interpretation of p-values 1 P-value 0.1 0.01 0.001 0.0001 Weak evidence against the null hypothesis Increasing evidence against the null hypothesis with decreasing P-value Strong evidence against the null hypothesis Statistical Inference Overweight and obese adults living in the UK Mean weight loss after 4 weeks Atkins group – 4.40 kg Weight Watchers group – 2.86 kg 300 adults participating in a RCT comparing 2 dietary interventions Source: Truby H et al. BMJ 2007 Statistical Inference Example: Randomised controlled trial of weight loss programmes in the UK Group n Sample mean Weight loss after 4 weeks (kg) Sample standard deviation Sample standard error Atkins 57 4.40 2.45 0.32 Weight Watchers 58 2.86 2.23 0.29 1) Estimate of difference in population mean weight loss after 4 weeks between Atkins & Weight Watchers groups = 4.40 – 2.86 = 1.54 kg 2) 95% CI: 0.67 kg to 2.41 kg Source: Truby H et al. BMJ 2007 Statistical Inference Interpretation 1) We found a difference of 1.54 kg in mean weight loss after 4 weeks between the Atkins & Weight Watchers diet groups. 2) From the 95% confidence interval, the true difference could be as much as 2.41 kg (much greater weight loss for Atkins diet) or 0.67 kg (marginally greater weight loss for the Atkins diet compared with Weight Watchers). Statistical Inference P-value: comparing two groups How likely is it we would see a difference this big What is the probability (P-value) of finding the observed difference IF IF There was NO real difference between the populations? The null hypothesis is true? Statistical Inference Null hypothesis – There is no difference in the population mean weight loss after 4 weeks between the Atkins and Weight Watchers groups 2-sided p-value <0.001 Thus the probability of observing a difference of at least 1.54 kg in the sample means of the two groups, assuming the null hypothesis is true, is <0.001 or <0.1%. Statistical Inference Presenting the results 1) Sample size 300 adults participating in a RCT comparing 2 dietary interventions 2) Estimate of the effect size Mean weight loss after 4 weeks for Atkins group compared to Weight watchers: 1.54 kg 3) Calculate a confidence interval 95% CI for difference in population means: 0.67 kg to 2.41 kg 4) Derive a p-value to test the hypothesis of no association P-value < 0.001 Overview • Defining your research question – PICOS • Describing data • Understanding the results – Estimates reported in the literature – Interpreting 95% confidence intervals and pvalues ~ Statistical Inference

Insights into the black box of statistics

Related documents

Products

Support

Insights into the black box of statistics

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib