# Confidence intervals, hypothesis testing, p values ```Section IV
Sampling distributions
Confidence intervals
Hypothesis testing and p values
1
Population and sample
We wish to make inferences
population (ie, generalize to “everyone”)
even though we only study one sample
(have only one study).
Population parameters=summary values for
the entire population (ex: μ,σ,ρ,β )
Sample statistics=summary values for a
sample (ex: Y, S, r, b)
2
Samples drawn from a population
Population
Sample is drawn “at random”.
Everyone in the target population
is eligible for sampling.
sample
3
True population distribution of Y
(individuals)- not Gaussian
Original distribution of Y-individuals
30%
25%
20%
15%
10%
5%
0%
1
2
3
4
Y
Mean Y=μ= 2.5,
SD=σ=1.12
4
Possible samples &amp; statistics from
the population (true mean=2.5)
sample (n=4)
(statistic)
1,1,1,1
…
2,2,4,3
…
4,4,4,4
mean
1.00
2.75
4.00
5
Distribution of the sample means (Ys)
- Sampling distributioneach observation is a SAMPLE statistic
sampling distribution
50
frequency
40
30
20
10
0
1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
__
Y
Mean Y = 2.5, SEM = 0.56, n=4
SEM = SD/n
the square root n law
6
Central Limit Theorem
For a large enough n, the distribution of any
sample statistic (mean, mean difference,
OR, RR, hazard, correlation coeff,regr coeff,
proportion…) from sample to sample has a
Gaussian (“Normal”) distribution centered at
the true population value.
The standard error is proportional to 1/√n.
(Rule of thumb: n&gt; 30 is usually enough. May
need non parametric methods for small n)
7
8
Funnel plot - true difference is δ= 5
Each point is one study (meta analysis)
400
350
sample size (n)
300
250
200
150
100
50
0
-15.0
-10.0
-5.0
0.0
5.0
10.0
15.0
20.0
25.0
sample mean difference
9
Resampling estimation
(“bootstrap”)
One does not repeatedly sample from the
same population, (one only carries out the
study once). But a “simulation” of repeated
sampling from the population can be obtained
by repeatedly sampling from the sample with
replacement &amp; computing the statistic from
each resample, creating an “estimated”
sampling distribution. The SD of the statistics
across all “resamples” is an estimate the
standard error (SE) for the statistic.
10
Samples drawn from a population
sample
Population
sample
Original sample
sample
Sample is drawn “at random” with
replacement. Everyone in the original
sample is eligible for sampling.
sample
11
Confidence interval (for μ)
We do not know μ from a sample.
For a sample mean Y and standard error
SE, a confidence interval for the
population mean μ is formed by
Y - Z SE, Y + Z SE
(sample statistic is in the middle)
For a 95% confidence interval, we use
Z=1.96 (Why?) and compute
Y – 1.96 SE, Y + 1.96 SE
lower
mean
upper
12
Confidence Intervals (CI)
and sampling dist of Y
Sampling
Distribution
2.5%
2.5%
-1.96(/n)

1.96(/n)
95% CI: Y  1.96 (/n)
13
95% Confidence
intervals
95% of the
intervals will
contain the true
population value
But which ones?
14
Z vs t (technical note)
Confidence intervals made with Z assume
that the population σ is known. Since σ is
usually not known and is estimated with
the sample SD, the Gaussian table areas
are called “t” tables instead of Gaussian
tables (t distribution). For n &gt; 30, they are
15
Z distribution vs t distribution, about the same for n &gt; 30
16
t vs Gaussian Z percentiles
%ile
85th
90th
95th
97.5th
99.5th
Confidence
70%
80%
90%
95%
99%
t, n=5
1.156
1.476
2.015
2.571
4.032
t, n=10
1.093
1.372
1.812
2.228
3.169
t, n=20
1.064
1.325
1.725
2.086
2.845
t, n=30
1.055
1.310
1.697
2.042
2.750
Gaussian 1.036
1.282
1.645
1.960
2.576
What did the z distribution say to the t distribution?
You may look like me but you're not normal.
17
Confidence Intervals
Sample Statistic &plusmn; Ztabled SE
(using known variance)
Sample Statistic &plusmn; ttabled SE
(using estimate of variance)
Example: CI for the difference between two means:
__ __
(Y1 – Y2) &plusmn; ttabled (SEd)
Tabled t uses degrees of freedom, df=n1+n2-2
18
CI for a proportion
“law” of small numbers
n=10, Proportion = 3/10 = 30%
What do you think are the 95% confidence
bounds?
Is is likely that the “real” proportion is more
than 50%?
19
CI for a proportion
“law” of small numbers
n=10, Proportion = 3/10 = 30%
What do you think are the 95% confidence
bounds?
Is is likely that the “real” proportion is more
than 50%?
Answer: 95% CI: 6.7% to 65.3%
20
Standard error for the difference
between two means
__
Y1 has mean μ1 and SE = √σ12/n1 = SE1
__
Y2 has mean μ2 and SE = √σ22/n2 = SE2
SEd
SE2
For the difference between two means
(δ=1 - 2)
SEδ =
√(σ12/n1
+ σ2
2/n
SEd = (SE12 + SE22)
SE1
2)
SEd is computed from SE1 and SE2
using “Pythagoras’ rule”.
SEd2 = SE12 + SE22
21
Statistics for HBA1c change
from base to 26 weeks (Pratley et al, Lancet 2010)
Tx
n
Mean
SD
SE
Liraglutide
225
-1.24
0.99
0.066
Sitaglipin
219
-0.90
0.98
0.066
__
Mean difference = d = 0.34 %
Std error of mean difference= SEd=[0.0662 + 0.0662] = 0.093%
Using t{df=442}=1.97 for the 95% confidence interval:
CI: 0.34% &plusmn; 1.97 (0.093%) or (0.16%, 0.52%)
22
Null hypothesis &amp; p values
Null Hypothesis- Assume that, in the population,
the two treatments give the same average
improvement in HbA1c. So the average
difference is δ=0.
Under this assumption, how likely is it to observe
a sample mean difference of d= 0.34% (or more
extreme) in any study? This probability is called
the (one sided) p value.
The p value is only defined for a given null
hypothesis.
23
Hypothesis testing
for a mean difference, d
d =sample mean HBA1c chg difference,
_
d = 0.34%, SEd = 0.093%
95% CI for true mean difference = (0.16%, 0.52%)
But, under the null hypothesis, the true mean difference (δ) should be zero.
How “far” is the observed 0.34% mean difference from zero (in SE
units)?
tobs = (mean difference – hypothesized difference) / SEdiff
tobs = (0.34 – 0) / 0.093 = 3.82 SEs
p value: probability of observing t=3.82 or larger if null hypothesis is true.
p value = 0.00008 (one sided t with df=442)
p value = 0.00016 (two sided)
24
Hypothesis test statistics
Zobs = (Sample Statistic – null value) / Standard error
Z (or t)=3.82
p value
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
z
25
26
Difference &amp; Non inferiority
(equivalence) hypothesis testing
Difference Testing:
Null Hyp: A=B (or A-B=0), Alternative: A≠B
Zobs = (observed stat – 0) / SE
Non inferiority (within δ) Testing:
Null Hyp: A &gt; B + δ, Alternative: A &lt;= B + δ
Zeq = (observed stat – δ )/ SE
Must specify δ for non inferiority testing
27
Non inferiority testing-HBA1c data
For HBA1c data, assume we declare non inferiority if
the true mean difference is δ=0.40% or less. The
observed mean difference is d=0.34%, which is
smaller than 0.40%. However, the null hypothesis
is that the true difference is 0.40% or more versus
the alternative of 0.40% or less. So
Zeq=(0.34 –0.40)/0.093=-0.643, p=0.260 (one sided)
We cannot reject the “null hyp” that the true δ is larger
than 0.40%. Our 95% confidence interval of
(0.16%, 0.52%) also does NOT exclude 0.40%,
even though it excludes zero.
28
Confidence intervals
versus hypothesis testing
Study
(1-8)
equivalence demonstrated only from –D tp +D
(brackets show 95% confidence intervals)
Stat
Sig
1. Yes ----------------------------------------------------------------------------------------------- &lt; not equivalent &gt;
2. Yes -----------------------------------------------------------------------------&lt; uncertain &gt;-------------------3. Yes ------------------------------------------------------------------&lt; equivalent &gt;----------------------------------4. No ---------------------------------------------------&lt; equivalent &gt;--------------------------------------------------5. Yes ----------------------------------&lt; equivalent &gt; ---------------------------------------------------------------6. Yes ---------------------&lt; uncertain&gt;---------------------------------------------------------------------------------7. Yes -&lt; not equivalent &gt;----------------------------------------------------------------------------------------------8. No ---------&lt;___________________________uncertain________________________________&gt;-----|
-D
O
true difference
|
+D
Ref: Statistics Applied to Clinical Trials- Cleophas, Zwinderman, Cleopahas 2000
29
Non inferiority
JAMA 2006 - Piaggio et al, p 1152-1160
30
Paired Mean Comparisons
Serum cholesterol in mmol/L
Difference between baseline and end of 4 weeks
Subject
1
2
3
4
5
6
mean
SD
SE
chol(baseline)
9.0
7.1
6.9
6.9
5.9
5.4
6.87
1.24
0.51
chol(4 wks)
6.5
6.3
5.9
4.9
4.0
4.9
5.42
0.97
0.40
difference(di)
2.5
0.8
1.0
2.0
1.9
0.5
1.45
0.79
0.32
_
Difference (baseline – 4 weeks) = amount lowered: d = 1.45 mmol/L
SD = 0.79 mmol/L SEd = 0.79/6 = 0.323 mmol/L, df = 6-1=5, t0.975 = 2.571
95% CI: 1.45 &plusmn; 2.571 (0.323) = 1.45 &plusmn; 0.830 or (0.62 mmol/L, 2.28 mmol/L)
t obs = 1.45 / 0.32 = 4.49, p value &lt; 0.001
31
Confidence Intervals
Hypothesis Tests
Confidence intervals are of the form
Sample Statistic +/- (Zpercentile*) (Standard error)
Lower bound = Sample Statistic- (Zpercentile)(Standard error)
Upper bound = Sample Statistic + (Zpercentile)(Standard error)
Hypothesis test statistics (Zobs*) are of the form
Zobs=(Sample Statistic – null value) / Standard error
* t percentile or tobs for continuous data when n is small
32
Sample statistics and their SEs
Sample Statistic
Mean
Mean difference
Proportion
Proportion difference
Log Odds ratio*
Log Risk ratio*
Slope (rate)
Hazard rate (survival)
Transform (z) of the
Correlation coefficient r*
*Form
Symbol
__
Y
__ __ _
Y1 – Y2 =d
P
P1 – P2
logeOR
logeRR
b
h
z=&frac12;loge[(1+r)/(1-r)]
r = (e2z -1)/(e2z + 1)
Standard error (SE)
S/√n = √[S 2/n] = SEM
√[S12/n1 + S22/n2]= SEd
√[P(1-P)/n]
√[P1(1-P1)/n1 + P2(1-P2)/n2]
√[ 1/a + 1/b + 1/c + 1/d]
√[1/a -1/(a+c) + 1/b - 1/(b+d)]
S error / Sx√(n-1)
SE(z)=1/√([n-3])
CI bounds on transformed scale, then take anti-transform
33
Handy Guide to Testing
Sample Statistic &amp;
Comparison
Population null hypothesis
Comparing two means
True population mean difference is zero
Comparing two proportions
True population difference is zero
Comparing two medians
True population median difference is zero
Odds ratio (comparing odds)
True population odds ratio is one
Risk ratio=relative risk (comparing risks)
True population risk ratio is one
Correlation coefficient (compare to zero)
True population correlation coefficient is zero
Slope=rate of change=regression coefficient
True population slope is zero
Comparing two survival curves
True difference in survival is zero at all times
34
Nomenclature for Testing
Delta (δ) = True difference or size of effect
Alpha (α) = Type I error = false positive
= Probability of rejecting the null hypothesis when it is true.
(Usually α is set to 0.05)
Beta (β)
= Type II error = false negative
=Probability of not rejecting the null hypothesis when delta is not zero
( there is a real difference in the population)
Power
=1–β
= Probability of getting a p value less than α
(ie declaring statistical significance)
when, in fact, there really is a non-zero delta.
We want small alpha levels and high power.
35
Statistical Hypothesis Testing
Statistic/type of comparison
Mean comparison-unpaired
Mean comparison-paired
Median comparison-unpaired
Median comparison-paired
Proportion comparison-unpaired
Proportion comparison-paired
Odds ratio
Risk ratio
Correlation, slope
Survival curves, hazard rates
Test/analysis procedure
t test (2 groups), ANOVA (3+ groups)
paired t test, repeated measures ANOVA
Wilcoxon rank sum test, KruskalWallis test*
Wilcoxon signed rank test on differences*
chi-square test (or Fishers test)
McNemar’s chi-square test
chi-square test, Fisher test
chi-square test, Fisher test
regression, t statistic
log rank test*
ANOVA = analysis of variance
* non parametric – Gaussian distribution theory is
not used to get the p value
36
Parametric vs non parametric
Compute p values using ranks of the data.
Does not assume stats follow Gaussian distribution – particularly
in distribution “tails”.
Parametric
2 indep meanst test
3+ indep meanANOVA F test
Paired meanspaired t test
Pearson correlation
Nonparametric
2 indep mediansWilcoxon rank sum test=MW
3+ indep mediansKruskal Wallis test
Paired mediansWilcoxon signed rank test
Spearman correlation
37
```