Statistics for Health Research Assessing the Evidence: Statistical Inference Peter T. Donnan Professor of Epidemiology and Biostatistics Objectives of Session • Understand idea of inference • Confidence interval approach • Significance testing • Briefly - Some simple tests Statistical Inference The aim is to draw conclusions (INFER) from the specific (sample) to the more general (population). Are differences between groups chance occurrences or do they represent statistically significant results (I.e. real differences)? Extrapolating from the sample to population Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee Two approaches: confidence intervals and hypothesis testing Confidence Intervals Random variability means that there is statistical (random) variation around any summary statistic: Mean, proportion, difference between means, etc Confidence intervals Uncertainty expressed as a Confidence Interval defined by an upper and a lower value: Summary statistic constant x standard error e.g. For 95% CI constant = 1.96 from Normal distribution Confidence intervals For a percentage the standard error is given by: se p (100 p ) n So for p = 35%, se = 4.8%, where n = 100 Confidence intervals Consider a prevalence of 35% for the uptake of statins for secondary prevention of MI from one practice prevalence = 35%, 95% CI = 35% ± 1.96x 4.8% = 25.6% to 44.4% Confidence intervals For a mean the standard error is given by: se = s n where s is the standard deviation of the distribution Confidence intervals Consider a mean cholesterol measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l = 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100 = 5.2 to 5.6 mmol/l Confidence intervals Confidence intervals give estimation of precision of summary statistic Precise Imprecise Major determinant of precision is sample size Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee Confidence intervals Warning! Confidence intervals are usually interpreted in a Bayesian way even though using a frequentist method to estimate The probability of the true value lying within the confidence interval is NOT, repeat NOT 95% Bayesian confidence intervals are called CREDIBLE INTERVALS and probability of true value lying in credible interval IS 95% Frequentist Confidence Interval means that with repeat samples… Study Sample & 95% CI Repeat Samples… …..ad infinitum 95% Confidence interval means that 95% of proportions from repeat studies would be within the confidence interval To put it another way the 95% Confidence Interval is… Sample & 95% CI Repeat Samples… …..ad infinitum …one of many that could be constructed with the assurance that 95% of the time the true value of the parameter would be included Statistical Inference: Hypothesis testing Are differences between groups chance occurrences or do they represent statistically significant results (I.e. real differences)? The process of inference starts from a neutral position – Null Hypothesis Statistical Inference: Hypothesis testing The null hypothesis (H0) is usually set to ‘there is no difference’ Collect data and carry out hypothesis tests Accept or reject the null hypothesis Legal analogy Hypothesis testing Legal trial Hypothesis test Defendant assumed innocent until proved guilty Null hypothesis assumes no difference between groups Legal analogy Hypothesis testing Legal trial Examine evidence Hypothesis test Calculate test statistic based on evidence from sample data Legal analogy Hypothesis testing Legal trial 1.Accept evidence proves guilt 2.Evidence does not prove guilt ‘not proven’ Hypothesis test 1. Accept significant difference between groups 2. Insufficient evidence to reject H0 Legal Analogy Hypothesis Testing No statistical significance not same as No difference Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee Statistical Inference: Hypothesis testing The test statistic generally consists of: Summary statistic – H0 value Standard error of summary e.g.Test that the mean is different to zero: t = Mean – 0 Se(Mean) Statistical Inference: Hypothesis testing The test statistic is then compared with tabulated values of a distribution (e.g Normal distribution, t-distribution) Assuming the null hypothesis is true, what is the probability of obtaining the actual observed value of the test statistic, t? How likely is the value of t, to have occurred by chance alone? Statistical Inference: Hypothesis testing Assuming the null hypothesis is true, what is the probability of obtaining the actual observed or greater value of the test statistic, t? Using distribution Of t which is similar to a Normal distribution this probability can be Obtained in figure as p = 0.042 2.1% 2.1% Statistical Inference: Hypothesis testing If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance finding Result is declared statistically significant Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’) Two group hypothesis testing We will consider three common tests: 1.t-test for difference between two means 2.Chi-squared test (2) for difference between two proportions 3. Logrank test for difference between two groups median survival All are easily carried out in SPSS Are practices with access to community hospitals further away on average from general hospitals? No access Access to CH n=17 n=10 Mean = 8.68 km Mean = 21.30 km SD SD = 11.90 km Se (mean) = 2.89 = 5.68 km Se (mean) = 1.79 Example t-test t = ( x1 sp x2 ) - 0 1 / n1 + 1 / n 2 1 and 2 refer to the two groups N is the number in each group X bar refers to the mean and sp is the pooled standard deviation Example t-test t = t t (8.68 - 21.30) 10.112 - 0 * 0.398 = -12.62/ 4.024 = -3.13 With 25 degrees of freedom from t-tables p = 0.004 and so the difference of 12.62 is highly statistically significant Consider a recent RCT • Rimonabant vs. placebo to reduce body weight • • • • in obese people (BMI > 30kg/m2) Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balance Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac) Difference was 4.7 kg (95% CI 4.1, 5.4) By end of year 2 mean weight was back to start! Are practices with access to community hospitals more likely to have training status? Community Hospital No No Training Status Training status Yes 12 (71%) 4 (40%) 5 (29%) 6 (60%) Are practices with access to community hospitals more likely to have training status? •Is the difference in proportions 60% - 29% = 31% well within the realms of chance or a statistically significant finding? •Null hypothesis Difference = 0 • Use chi-squared (2) test for significance of difference Pearson Chi-Squared Test Comm. Hosp. No training status Training status No Yes a b a+b c d c+d a+c b+d N Pearson Chi-Squared Test 2 N ad bc a 2 b c d a c b d where N = a+b+c+d and |ad – bc| means take the positive value of the calculation Pearson Chi-Squared Test 2 27 72 20 2 16 11 17 10 = 2.44 with 1 degree of freedom df = (no. rows – 1) x (no. columns – 1) P = 0.118 which is not statistically significant More complicated analyses • Introduced simple two-group tests • Results of more complicated analyses are • • • • • • expressed in the same way Summary statistic and 95% confidence interval Usually p-value is also stated but often implicit from the confidence interval Beware spurious significance e.g. p = 0.034729 (3 d.p. are enough) ‘Importance’ refers to size of difference An ‘important’ result can be statistically nonsignificant Sacred 5% level • 5% level is arbitrary • Practical choice before computer era to make • • • • tables easier to construct Are p = 0.046 and p = 0.051 different? In past researchers tended to only present pvalues Now emphasis is on size of effect and 95% CI Unfortunately, Editors still influenced by pvalues leading to publication bias Summary • Do not get carried away by p-values • Interpretation requires knowledge of area to put into context, but also understanding of what the tests do • A p-value close to 5% is approaching significance and may suggest it is worth investigating • The size of the effect is more clinically or scientifically important