Section IV Sampling distributions Confidence intervals Hypothesis testing and p values 1 Population and sample We wish to make inferences (generalizations) about an entire target population (ie, generalize to “everyone”) even though we only study one sample (have only one study). Population parameters=summary values for the entire population (ex: μ,σ,ρ,β ) Sample statistics=summary values for a sample (ex: Y, S, r, b) 2 Samples drawn from a population Population Sample is drawn “at random”. Everyone in the target population is eligible for sampling. sample 3 True population distribution of Y (individuals)- not Gaussian Original distribution of Y-individuals 30% 25% 20% 15% 10% 5% 0% 1 2 3 4 Y Mean Y=μ= 2.5, SD=σ=1.12 4 Possible samples & statistics from the population (true mean=2.5) sample (n=4) (statistic) 1,1,1,1 … 2,2,4,3 … 4,4,4,4 mean 1.00 2.75 4.00 5 Distribution of the sample means (Ys) - Sampling distributioneach observation is a SAMPLE statistic sampling distribution 50 frequency 40 30 20 10 0 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00 __ Y Mean Y = 2.5, SEM = 0.56, n=4 SEM = SD/n the square root n law 6 Central Limit Theorem For a large enough n, the distribution of any sample statistic (mean, mean difference, OR, RR, hazard, correlation coeff,regr coeff, proportion…) from sample to sample has a Gaussian (“Normal”) distribution centered at the true population value. The standard error is proportional to 1/√n. (Rule of thumb: n> 30 is usually enough. May need non parametric methods for small n) 7 8 Funnel plot - true difference is δ= 5 Each point is one study (meta analysis) 400 350 sample size (n) 300 250 200 150 100 50 0 -15.0 -10.0 -5.0 0.0 5.0 10.0 15.0 20.0 25.0 sample mean difference 9 Resampling estimation (“bootstrap”) One does not repeatedly sample from the same population, (one only carries out the study once). But a “simulation” of repeated sampling from the population can be obtained by repeatedly sampling from the sample with replacement & computing the statistic from each resample, creating an “estimated” sampling distribution. The SD of the statistics across all “resamples” is an estimate the standard error (SE) for the statistic. 10 Samples drawn from a population sample Population sample Original sample sample Sample is drawn “at random” with replacement. Everyone in the original sample is eligible for sampling. sample 11 Confidence interval (for μ) We do not know μ from a sample. For a sample mean Y and standard error SE, a confidence interval for the population mean μ is formed by Y - Z SE, Y + Z SE (sample statistic is in the middle) For a 95% confidence interval, we use Z=1.96 (Why?) and compute Y – 1.96 SE, Y + 1.96 SE lower mean upper 12 Confidence Intervals (CI) and sampling dist of Y Sampling Distribution 2.5% 2.5% -1.96(/n) 1.96(/n) 95% CI: Y 1.96 (/n) 13 95% Confidence intervals 95% of the intervals will contain the true population value But which ones? 14 Z vs t (technical note) Confidence intervals made with Z assume that the population σ is known. Since σ is usually not known and is estimated with the sample SD, the Gaussian table areas need to be adjusted. The adjusted tables are called “t” tables instead of Gaussian tables (t distribution). For n > 30, they are about the same. 15 Z distribution vs t distribution, about the same for n > 30 16 t vs Gaussian Z percentiles %ile 85th 90th 95th 97.5th 99.5th Confidence 70% 80% 90% 95% 99% t, n=5 1.156 1.476 2.015 2.571 4.032 t, n=10 1.093 1.372 1.812 2.228 3.169 t, n=20 1.064 1.325 1.725 2.086 2.845 t, n=30 1.055 1.310 1.697 2.042 2.750 Gaussian 1.036 1.282 1.645 1.960 2.576 What did the z distribution say to the t distribution? You may look like me but you're not normal. 17 Confidence Intervals Sample Statistic ± Ztabled SE (using known variance) Sample Statistic ± ttabled SE (using estimate of variance) Example: CI for the difference between two means: __ __ (Y1 – Y2) ± ttabled (SEd) Tabled t uses degrees of freedom, df=n1+n2-2 18 CI for a proportion “law” of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the “real” proportion is more than 50%? 19 CI for a proportion “law” of small numbers n=10, Proportion = 3/10 = 30% What do you think are the 95% confidence bounds? Is is likely that the “real” proportion is more than 50%? Answer: 95% CI: 6.7% to 65.3% 20 Standard error for the difference between two means __ Y1 has mean μ1 and SE = √σ12/n1 = SE1 __ Y2 has mean μ2 and SE = √σ22/n2 = SE2 SEd SE2 For the difference between two means (δ=1 - 2) SEδ = √(σ12/n1 + σ2 2/n SEd = (SE12 + SE22) SE1 2) SEd is computed from SE1 and SE2 using “Pythagoras’ rule”. SEd2 = SE12 + SE22 21 Statistics for HBA1c change from base to 26 weeks (Pratley et al, Lancet 2010) Tx n Mean SD SE Liraglutide 225 -1.24 0.99 0.066 Sitaglipin 219 -0.90 0.98 0.066 __ Mean difference = d = 0.34 % Std error of mean difference= SEd=[0.0662 + 0.0662] = 0.093% Using t{df=442}=1.97 for the 95% confidence interval: CI: 0.34% ± 1.97 (0.093%) or (0.16%, 0.52%) 22 Null hypothesis & p values Null Hypothesis- Assume that, in the population, the two treatments give the same average improvement in HbA1c. So the average difference is δ=0. Under this assumption, how likely is it to observe a sample mean difference of d= 0.34% (or more extreme) in any study? This probability is called the (one sided) p value. The p value is only defined for a given null hypothesis. 23 Hypothesis testing for a mean difference, d d =sample mean HBA1c chg difference, _ d = 0.34%, SEd = 0.093% 95% CI for true mean difference = (0.16%, 0.52%) But, under the null hypothesis, the true mean difference (δ) should be zero. How “far” is the observed 0.34% mean difference from zero (in SE units)? tobs = (mean difference – hypothesized difference) / SEdiff tobs = (0.34 – 0) / 0.093 = 3.82 SEs p value: probability of observing t=3.82 or larger if null hypothesis is true. p value = 0.00008 (one sided t with df=442) p value = 0.00016 (two sided) 24 Hypothesis test statistics Zobs = (Sample Statistic – null value) / Standard error Z (or t)=3.82 p value -3.0 -2.0 -1.0 0.0 1.0 2.0 3.0 z 25 26 Difference & Non inferiority (equivalence) hypothesis testing Difference Testing: Null Hyp: A=B (or A-B=0), Alternative: A≠B Zobs = (observed stat – 0) / SE Non inferiority (within δ) Testing: Null Hyp: A > B + δ, Alternative: A <= B + δ Zeq = (observed stat – δ )/ SE Must specify δ for non inferiority testing 27 Non inferiority testing-HBA1c data For HBA1c data, assume we declare non inferiority if the true mean difference is δ=0.40% or less. The observed mean difference is d=0.34%, which is smaller than 0.40%. However, the null hypothesis is that the true difference is 0.40% or more versus the alternative of 0.40% or less. So Zeq=(0.34 –0.40)/0.093=-0.643, p=0.260 (one sided) We cannot reject the “null hyp” that the true δ is larger than 0.40%. Our 95% confidence interval of (0.16%, 0.52%) also does NOT exclude 0.40%, even though it excludes zero. 28 Confidence intervals versus hypothesis testing Study (1-8) equivalence demonstrated only from –D tp +D (brackets show 95% confidence intervals) Stat Sig 1. Yes ----------------------------------------------------------------------------------------------- < not equivalent > 2. Yes -----------------------------------------------------------------------------< uncertain >-------------------3. Yes ------------------------------------------------------------------< equivalent >----------------------------------4. No ---------------------------------------------------< equivalent >--------------------------------------------------5. Yes ----------------------------------< equivalent > ---------------------------------------------------------------6. Yes ---------------------< uncertain>---------------------------------------------------------------------------------7. Yes -< not equivalent >----------------------------------------------------------------------------------------------8. No ---------<___________________________uncertain________________________________>-----| -D O true difference | +D Ref: Statistics Applied to Clinical Trials- Cleophas, Zwinderman, Cleopahas 2000 Kluwer Academic Pub Page 35 29 Non inferiority JAMA 2006 - Piaggio et al, p 1152-1160 30 Paired Mean Comparisons Serum cholesterol in mmol/L Difference between baseline and end of 4 weeks Subject 1 2 3 4 5 6 mean SD SE chol(baseline) 9.0 7.1 6.9 6.9 5.9 5.4 6.87 1.24 0.51 chol(4 wks) 6.5 6.3 5.9 4.9 4.0 4.9 5.42 0.97 0.40 difference(di) 2.5 0.8 1.0 2.0 1.9 0.5 1.45 0.79 0.32 _ Difference (baseline – 4 weeks) = amount lowered: d = 1.45 mmol/L SD = 0.79 mmol/L SEd = 0.79/6 = 0.323 mmol/L, df = 6-1=5, t0.975 = 2.571 95% CI: 1.45 ± 2.571 (0.323) = 1.45 ± 0.830 or (0.62 mmol/L, 2.28 mmol/L) t obs = 1.45 / 0.32 = 4.49, p value < 0.001 31 Confidence Intervals Hypothesis Tests Confidence intervals are of the form Sample Statistic +/- (Zpercentile*) (Standard error) Lower bound = Sample Statistic- (Zpercentile)(Standard error) Upper bound = Sample Statistic + (Zpercentile)(Standard error) Hypothesis test statistics (Zobs*) are of the form Zobs=(Sample Statistic – null value) / Standard error * t percentile or tobs for continuous data when n is small 32 Sample statistics and their SEs Sample Statistic Mean Mean difference Proportion Proportion difference Log Odds ratio* Log Risk ratio* Slope (rate) Hazard rate (survival) Transform (z) of the Correlation coefficient r* *Form Symbol __ Y __ __ _ Y1 – Y2 =d P P1 – P2 logeOR logeRR b h z=½loge[(1+r)/(1-r)] r = (e2z -1)/(e2z + 1) Standard error (SE) S/√n = √[S 2/n] = SEM √[S12/n1 + S22/n2]= SEd √[P(1-P)/n] √[P1(1-P1)/n1 + P2(1-P2)/n2] √[ 1/a + 1/b + 1/c + 1/d] √[1/a -1/(a+c) + 1/b - 1/(b+d)] S error / Sx√(n-1) h/√[number dead] SE(z)=1/√([n-3]) CI bounds on transformed scale, then take anti-transform 33 Handy Guide to Testing Sample Statistic & Comparison Population null hypothesis Comparing two means True population mean difference is zero Comparing two proportions True population difference is zero Comparing two medians True population median difference is zero Odds ratio (comparing odds) True population odds ratio is one Risk ratio=relative risk (comparing risks) True population risk ratio is one Correlation coefficient (compare to zero) True population correlation coefficient is zero Slope=rate of change=regression coefficient True population slope is zero Comparing two survival curves True difference in survival is zero at all times 34 Nomenclature for Testing Delta (δ) = True difference or size of effect Alpha (α) = Type I error = false positive = Probability of rejecting the null hypothesis when it is true. (Usually α is set to 0.05) Beta (β) = Type II error = false negative =Probability of not rejecting the null hypothesis when delta is not zero ( there is a real difference in the population) Power =1–β = Probability of getting a p value less than α (ie declaring statistical significance) when, in fact, there really is a non-zero delta. We want small alpha levels and high power. 35 Statistical Hypothesis Testing Statistic/type of comparison Mean comparison-unpaired Mean comparison-paired Median comparison-unpaired Median comparison-paired Proportion comparison-unpaired Proportion comparison-paired Odds ratio Risk ratio Correlation, slope Survival curves, hazard rates Test/analysis procedure t test (2 groups), ANOVA (3+ groups) paired t test, repeated measures ANOVA Wilcoxon rank sum test, KruskalWallis test* Wilcoxon signed rank test on differences* chi-square test (or Fishers test) McNemar’s chi-square test chi-square test, Fisher test chi-square test, Fisher test regression, t statistic log rank test* ANOVA = analysis of variance * non parametric – Gaussian distribution theory is not used to get the p value 36 Parametric vs non parametric Compute p values using ranks of the data. Does not assume stats follow Gaussian distribution – particularly in distribution “tails”. Parametric 2 indep meanst test 3+ indep meanANOVA F test Paired meanspaired t test Pearson correlation Nonparametric 2 indep mediansWilcoxon rank sum test=MW 3+ indep mediansKruskal Wallis test Paired mediansWilcoxon signed rank test Spearman correlation 37