Introduction to Univariate Statistics MSc in Pharmaceutical Medicine Statistics & Data Management Module Irene Rebollo-Mesa Senior Lecturer in Trials King’s Clinical Trials Unit, Biostatistics Institute of Psychiatry, King’s College London Irene.rebollo_mesa@kcl.ac.uk Outline 1 Comparing 2 Independent Samples 2 Comparing differences in a Paired Sample 3 Compare Several Independent Samples 4 Conclusions 2 1 Comparing 2 Independent Samples 3 Univariate Statistical Methods: Comparing 2 Independent Samples Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE TWO INDEPENDENT SAMPLES Compare Two Means Continuous, Normal dist. Independent Samples T-test Compare Two Proportions Categorical, Binary, all >5 Chi-squared test Compare Two Proportions Categorical, Binary, some <5 Fisher’s exact test Compare Distributions Ordinal Wilcoxon 2-sample signed rank test (Mann Whitney U test) Compare Time to an Event Time to event in Two groups Logrank test 4 Independent Samples t-test Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE TWO INDEPENDENT SAMPLES Compare Two Means Continuous, Normal dist. Independent Samples T-test Compare Two Proportions Categorical, Binary, all >5 Chi-squared test Compare Two Proportions Categorical, Binary, some <5 Fisher’s exact test Compare Distributions Ordinal Wilcoxon 2-sample signed rank test (Mann Whitney U test) Compare Time to an Event Time to event in Two groups Logrank test 5 Independent Samples t-test The Two-sample t-test is based on the difference in sample means divided by the standard error (s.e.) of the difference in sample means: Difference in Sample Means t= s.e.(Difference in Sample Means) oUnder the Null Hypothesis (H0) we expect to see no difference in means. oThe larger the sample difference in means (either positive or negative) the more our data appear to depart from the H0. oThe standard error (=standard deviation / n) of the sample mean tells us how far from zero we might expect the sample mean to be under the N 6 Independent Samples t-test: Example Cocco G, Pandolfi S, Rousson V: Sufficient Weight Reduction Decreases Cardiovascular Complications in Diabetic Patients with the Metabolic Syndrome. A Randomized Study of Orlistat as an Adjunct to Lifestyle Changes (Diet and Exercise) Heart Drug 2005;5:68-74 90 patients with metabolic syndrome and diabetes Aged: >35 years; BMI: 31-40; LVEF: 42-50% Treatment: Daily exercise, diet and either Orlistat or Placebo Primary Outcome: Weight loss after diet of 6 months (kgs) Orlistat Placebo 0.8 3.0 7.4 7.9 8.6 3.1 8.6 10.8 6.0 3.6 6.0 6.1 7.9 6.0 3.0 8.7 6.2 3.2 4.2 7.6 16.0 3.1 6.9 5.6 8.3 7.7 4.4 4.6 7.3 11.6 13.7 7.3 7.8 6.7 -3.9 1.8 3.3 2.3 1.1 5.0 -0.8 7.0 4.3 -2.8 -3.4 -0.9 1.5 3.4 5.6 5.2 6.4 2.9 5.6 3.7 1.7 2.2 3.8 5.5 0.7 4.6 1.4 2.0 -0.2 4.9 1.9 5.7 2.0 4.5 3.4 4.5 3.8 2.9 2.5 3.3 1.5 1.5 5.9 4.9 4.5 -2.1 -0.5 2.2 1.6 -0.6 0.0 -1.5 6.3 1.7 -6.2 -1.9 7 Independent Samples t-test: Example Histograms 8 Independent Samples t-test: Example Boxplots 9 Independent Samples t-test: Example Rationale • Patients can be expected to respond differently • Orlistat does not reduce weight in every patient • Some patients who receive Placebo lose weight. • The sample means for this experiment are : Orlistat: 5.41 kgs Placebo: 2.48 kgs NULL Hypothesis • “In the population of patients with metabolic syndrome and diabetes aged more than 35 the difference in mean weight changes at 8 months is zero kgs” • Patients can be expected to respond differently • Does the difference in sample means , 2.93 kgs, provide evidence that the null hypothesis is not true? 10 Independent Samples t-test: Example Calculation The Two-sample t-test is based on the difference in sample means divided by the standard error (s.e.) of the difference in sample means: Difference in Sample Means t= s.e.(Difference in Sample Means) Weight Change Example: Difference in Sample Means: 5.41 -2.48= 2.93 kgs s.e (Difference in Sample Means) : 0.72 kgs t = 2.93/0.72 = 4.09 : p-value=0.000096 11 Independent Samples t-test: Assumptions • Each sample is representative of the population • There are no difference between the samples other than the treatments • Measurements in each sample are independent of each other and of the measurements in the other sample • Measurements are Normally distributed in the population • The variances in each sample are the same • If the assumptions of the test are not met, the p-value may be misleading 12 Binary Outcome Data: Chi-squared test Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE TWO INDEPENDENT SAMPLES Compare Two Means Continuous, Normal dist. Independent Samples T-test Compare Two Proportions Categorical, Binary, all >5 Chi-squared test Compare Two Proportions Categorical, Binary, some <5 Fisher’s exact test Compare Distributions Ordinal Wilcoxon 2-sample signed rank test (Mann Whitney U test) Compare Time to an Event Time to event in Two groups Logrank test 13 Binary Outcome Data: Chi-squared test •H0 :Graft outcome does not differ between donor types. •Overall kidney failure rate = 178/819 =0.217 (21.7%) •If there were no difference between donor types we would expect the same rates in each group •In other words amongst ,e.g. the 541 Deceased HB inoculated we would EXPECT 541*0.217=117 with kidney failure •Similarly amongst the 175 living related we would EXPECT 175*0.217=38 with kidney failure 14 Binary Outcome Data: Chi-squared test •As a measure of how “far” the expected (E) outcomes are from the observed (O) outcomes we calculate (O - E)2 å •In this example : E ( 401 - 423) 2 (140 -117) 2 (151 -137) 2 (24 - 38) 2 423 + 117 + 137 38 ... = 21.298 •And this is referred to the chi-squared distribution (3 df [ncol-1*nrow1] ) giving a p-value < .001. 15 Binary Outcome Data: Interpreting Results •Expected frequencies, proportions and standardized residuals. 16 Binary Outcome Data: Size of Effect •Relative Risk between 2 groups: living vs. deceased. Most appropriate for prospective-cohort studies with complete follow up. H0: RR=1. RR=2.21 -> Kidney failure is 2.21 times higher in deceased donors than in living donors. n11 Rr = n 21 n1+ n 2+ p1 = p2 •Odds Ratio between 2 groups: living vs. deceased. n11 Most appropriate for retrospective or case-control studies. Better mathematical properties, can be adjusted using log.reg. It approximates RR for small prev. H0: OR=1. OR=2.63 -> Among deceased donors the odds of kidney failure is 2.63 times higher than in living donors Or = n21 p1 (1- p1 ) n12 = n22 p2 (1- p2 ) 17 Binary Outcome Data: Fisher’s Exact Test Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE TWO INDEPENDENT SAMPLES Compare Two Means Continuous, Normal dist. Independent Samples T-test Compare Two Proportions Categorical, Binary, all >5 Chi-squared test Compare Two Proportions Categorical, Binary, some <5 Fisher’s exact test Compare Distributions Ordinal Wilcoxon 2-sample signed rank test (Mann Whitney U test) Compare Time to an Event Time to event in Two groups Logrank test 18 Binary Outcome Data: Fisher’s Exact Test •Two Frequencies <5 chi-squared invalid. •Fisher’s exact only for 2x2 tables. Based on the probs. Associated with all possible tables based on marginal totals. (no simple formula…) 19 Binary Outcome Data: Non-Parametric Wilcoxon signed rank test Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE TWO INDEPENDENT SAMPLES Compare Two Means Continuous, Normal dist. Independent Samples T-test Compare Two Proportions Categorical, Binary, all >5 Chi-squared test Compare Two Proportions Categorical, Binary, some <5 Fisher’s exact test Compare Distributions Ordinal Wilcoxon 2-sample signed rank test (Mann Whitney U test) Compare Time to an Event Time to event in Two groups Logrank test 20 Binary Outcome Data: Non-Parametric Wilcoxon signed rank test • If the Normal assumptions are not met, you can try nonparametric statistics, which do not assume a particular family of distribution for the data. • Wilcoxon signed rank test (Mann-Whitney U test): It’s the non-parametric 2-sample T test . NHThere is no tendency for members of one population to exceed members of the other. 21 Non-Parametric Wilcoxon signed rank test 1. Rank the data ignoring groups 2. Add the ranks in each group separately to give T1 and T2 3. The test statistic is the smallest of the Ts If the NH were true, we would expect the two rank sums to be about the same The smaller the T, the lower the probability of the data arising by chance. The T is compared with the expected smallest T given the sample sizes of each group if smaller p<0.05 22 Weight Loss -6.2 -3.9 -3.4 -2.8 -2.1 -1.9 -1.5 -0.9 -0.8 -0.6 -0.5 -0.2 0 0.7 0.8 1.1 1.4 1.5 1.5 1.5 1.6 1.7 1.7 1.8 1.9 … Group Placebo Orlistrat Orlistrat Orlistrat Placebo Placebo Placebo Placebo Orlistrat Placebo Placebo Placebo Placebo Placebo Orlistrat Orlistrat Placebo Placebo Placebo Placebo Placebo Placebo Placebo Orlistrat Placebo Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 19 19 19 21 23 23 24 25 Wilcoxon signed rank test: Orlistat Example N Placebo = 45; N Orlistat = 45 W=1,548, p = 0.0000428 23 Non-Parametric Tests: Be careful? • Non-parametric tests are less powerful • You can not get Confidence Intervals • They are useless with very small samples: It is not true that you should always use nonparametrics with small samples!!! • They are very useful with ordinal data because they are based on ranks. 24 2 Comparing Differences in a Paired Sample 25 Univariate Statistical Methods: Differences in a Paired Sample Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE DIFFERENCES IN A PAIRED SAMPLE Test Mean Difference Continuous, Normal dist. for differences Paired (matched) Samples T-test Compare Two paired Proportions Categorical, Binary McNemar’s test Distribution of Differences Ordinal, symmetrical distributions Wilcoxon matched paired test 26 Univariate Statistical Methods: Differences in a Paired Sample Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE DIFFERENCES IN A PAIRED SAMPLE Test Mean Difference Continuous, Normal dist. for Paired (matched) Samples Tdifferences test Compare Two paired Proportions Categorical, Binary Distribution of Differences Ordinal, symmetrical distributions McNemar’s test Wilcoxon matched paired test 27 Paired samples T-test paired-samples t-test uses the following test statistic: t = sample mean (difference) / s.e. of sample mean (difference) o Under the Null Hypothesis (H0) we expect to see a mean difference = 0. o The larger the sample mean difference (either positive or negative) the more our data appear to depart from the H0. o The standard error (=standard deviation / n) of the sample mean tells us how far from zero we might expect the sample mean to be under the N 28 Paired samples T-test: Example Effect of an Method of Inhalation on Urinary Albuterol Excretion Hindle et al (Chest, 1995) • 9 Subjects • Each inhaled 4x100 mg of Albuterol Using a MeteredDose Inhaler (MDI) and a Dry Powder Inhaler (DISK) • Outcome: 30-min Postinhalation Urinary Albuterol Excretion (% Inhaled Dose) • Crossover trial • Within Patient Variation Subject Period 1 Period 2 1 MDI DISK 2 DISK MDI 3 DISK MDI 4 MDI DISK 5 DISK MDI 6 MDI DISK 7 MDI DISK 8 DISK MDI 9 DISK MDI Paired samples T-test: Example: Difference in Urinary Excretion Differen ce -0.15 2 0.26 0.80 -0.54 3 1.18 0.92 0.26 4 1.32 3.45 -2.13 5 0.37 3.85 -3.48 6 2.18 4.96 -2.78 7 2.62 2.11 0.51 8 0.85 1.97 -1.12 9 1.27 2.47 -1.20 Mean 1.19 2.38 -1.18 5 0.85 4 0.70 3 1 Urinary Excretion by Inhalation Method 2 DISK 1 MDI Urinary Excretion Volunte er MDI DISK Inhalation method 30 Paired samples T-test: Example o Outcome V.: Urinary Excretion-->Continuous o Independent V.: 1 factor with two levels, within sbjs: MDI vs. DISK paired-samples t-test uses the following test statistic: t = sample mean (difference) / s.e. of sample mean (difference) o Under the Null Hypothesis (H0) we expect to see a mean difference in urinary excretion close to zero in the sample. o The larger the sample mean difference (either positive or negative) the more our data appear to depart from the H0. o The standard error (=standard deviation / n) of the sample mean tells us how far from zero we might expect the sample 31 mean to be under the N Paired samples T-test: Example Null Hypothesis: “In the population of volunteers studied urinary excretion of Albuterol is the same whether delivered with MDI or DISK” Does our sample mean of –1.18 provide any evidence that this is not true? Sample mean = -1.181; S.E. of sample mean = 0.459 Statistic: t = -1.181/0.459 = -2.574 Using a computer, or published tables, we obtain p=0.0329 (two-sided) i.e. if the H0 were true there would only be a 3.3 % chance that we would see such a large, or larger, mean change in our sample. 32 What Have We Learned? • Every subject can be expected to respond differently. • DISK does not increase the urinary excretion of every patient. • The mean difference in urinary excretion was 1.18 mmol/litre (significantly less on MDI) • We can not be certain how new volunteers will respond to the same treatment. 33 Assumptions of paired-samples t-test • The sample is representative of the population. • Measurements in the sample are all independent of one another. • Measurements are Normally distributed in the population. • If the assumptions of a test are not met the p-value may not be correct. 34 Univariate Statistical Methods: Compare Two Paired Proportions Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE DIFFERENCES IN A PAIRED SAMPLE Test Mean Difference Continuous, Normal dist. for differences Paired (matched) Samples T-test Compare Two paired Proportions Categorical, Binary McNemar’s test Distribution of Differences Ordinal, symmetrical distributions Wilcoxon matched paired test 35 Binary outcome, Paired samples: McNemar’s Test Bronchodialator treatment and deaths from asthma: case-control study. BMJ 2005 Each patient who died was matched to a surviving patient, and risk factors looked at use of β2 Antagonist. = •H0 :The prevalence of use of β2 Antagonist is the same among patients who died and patients who survived • McNemar’s is based on discordant pairs. Assuming there is no association there should be as many of each (yes/no) & (no/yes) 36 Binary outcome, Paired samples: McNemar’s Test = Expected Frequency of Discordant pairs = (69+45)/2 = 57 S (O - E ) E discordant cells ( 45 - 57) 57 2 + 2 = ( 69 - 57) 57 2 = 5.05 37 Univariate Statistical Methods: Test Distribution of Differences in Paired Samples Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE DIFFERENCES IN A PAIRED SAMPLE Test Mean Difference Continuous, Normal dist. for differences Paired (matched) Samples T-test Compare Two paired Proportions Categorical, Binary McNemar’s test Distribution of Differences Ordinal, symmetrical distributions Wilcoxon matched paired test 38 Ordinal Outcome Data: Wilcoxon Matched Pairs Test Placebo Active 0.68 0.61 0.96 1.00 0.85 0.72 0.93 0.81 0.35 0.26 0.77 0.7 0.74 0.61 0.98 1.00 1.00 0.98 0.28 0.16 0.92 0.79 1.00 0.98 0.93 1.00 1.00 0.96 0.45 0.49 0.82 0.74 0.36 0.26 1.00 0.97 0.49 0.43 0.74 0.66 0.68 0.61 39 Sample Mean Difference Sign A-P Rank -0.07 0.04 -0.13 -0.12 -0.09 -0.07 -0.13 0.02 -0.02 -0.12 -0.13 -0.02 0.07 -0.04 0.04 -0.08 -0.1 -0.03 -0.06 -0.08 -0.07 -0.056 + + + + - 10.5 6 20 17.5 15 10.5 20 2 2 17.5 20 2 10.5 6 6 13.5 16 4 8 13.5 10.5 CROSS-OVER TRIAL: •Measure of Treatment Effect on proportion of days with headache •H0=The distribution of differences is symmetrical about zero test based on probability ditribution of ranks of the differences •If the null hypothesis is true a plus is as likely as a minus Probability(A>P)= Probability(P>A)=0.5 Ordinal Outcome Data: Wilcoxon Matched Pairs Test - We rank the differences ignoring the sign - We sum the ranks of the positive and negative differences separately (excluding 0): T+ & T If the NH were true, we would expect the two rank sums to be about the same The test statistic, T, is the lesser of the sums. T+=206.5, T-=24.5. The smaller the T, the lower the probability of the data arising by chance. 40 3 Compare Several Independent Samples 41 Univariate Statistical Methods: Compare Several Independent Samples Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE SEVERAL INDEPENDENT SAMPLES Compare Several means Continuous, Normal dist., Same Variance Compare Time to an Event Survival in Several Groups One-way Analysis of Variance (ANOVA) Logrank Test 42 Univariate Statistical Methods: Compare Several Means Design or Aim of Study Type of Outcome Data/Assumptions Statistical Method COMPARE SEVERAL INDEPENDENT SAMPLES Compare Several means Continuous, Normal dist., Same Variance Compare Time to an Event Survival in Several Groups One-way Analysis of Variance (ANOVA) Logrank Test 43 Compare Several Means: ANOVA • An extension of t-test to compare 3 or more independent groups. • H0: The samples for each group com from populations with same mean values. • It provides one p value comparing all groups. Only if that is significant further contrasts are justified. • ANOVA is based on partitioning variability : Between Group Variance: Variability (differences) between the groups Residual Variance: Remaining variability due to within group differences. 44 ANOVA: The F ratio Statistic: The F ratio of the two variances if the groups are truly different Between-Group variability should be greater than the residual (Within-Group). “Visual ANOVA" from the Wolfram Demonstrations Project http://demonstrations.wolfram.com/VisualANOVA/ 45 ANOVA: The F ratio Hi BW Low Within Not all groups differ! 46 ANOVA: The F ratio Hi BW Hi Within 47 ANOVA: Example Use of RAB 753 in psoriasis – a multi centre Phase 2 clinical trial Protocol: A double blind randomised placebo controlled parallel group study of 2 concentrations of topical RSB 753 (.5 and 1%) in patients with moderate to severe psoriasis. Primary Outcome Measure: PASI score (4 weeks score on table). Treatment Groups: Placebo, .5% and 1%. N=24 per group Placebo 4.2 4 4.3 4.5 4.2 4.8 4.8 4 4.8 4.6 4.7 3.8 4.7 4.6 4.6 4.3 4.6 4 4.3 4.7 4.8 4.9 4 4 0.50% 4 3.9 3.9 3.3 3.9 3.5 4 4.4 3.6 4.4 4 3.7 3.5 4.7 4 3.5 4 3.6 4.4 4 4 3.5 3.6 3 1% 3.2 3.2 3.6 4.1 3.8 3.1 2.9 4.8 4.9 3.4 3.5 2.8 2.7 4.8 3.2 3.2 3.7 4.5 3 2.8 4.8 3 2.8 4.8 48 ANOVA: Example 49 ANOVA: Example Group Means are different overall. 5 Conclusions • The Design and Type of Data determine the proper statistical test to be used. • Statistical analysis must be guided by hypothesis. It’s not a fishing expedition Beware of Type I error. • Specify clearly for your statistician: 1. Design 2. Hypothesis 3. Variables: type and scale of measurement 51