Statistics for clinicians

advertisement
Statistics for clinicians
• Biostatistics course by Kevin E. Kip, Ph.D., FAHA
Professor and Executive Director, Research Center
University of South Florida, College of Nursing
Professor, College of Public Health
Department of Epidemiology and Biostatistics
Associate Member, Byrd Alzheimer’s Institute
Morsani College of Medicine
Tampa, FL, USA
1
SECTION 3.1
Module Overview
and Introduction
Confidence intervals,
estimation of parameters,
and hypothesis testing.
Module 3 Learning Objectives:
1. Describe the concepts of parameter estimation and
confidence intervals
2. Apply use of the z and t distribution for calculation
of confidence intervals based on sample size
3. Select appropriate z and t values based on the
width of a desired confidence interval
4. Calculate and interpret confidence intervals for
means, proportions, and relative risk for one and
two sample designs including matched design
5. Use SPSS to calculate confidence intervals
6. Distinguish the theoretical relationship between the
risk ratio and odds ratio
Module 3 Learning Objectives:
7. List the concept, guidelines, and primary steps
involved in hypothesis testing
8. Differentiate between the “null” and “alternative”
hypothesis.
9. Understand and interpret parameters used in
hypothesis testing (level of significance, p-value).
10. Differentiate type I and type II error and factors
that impact statistical power.
11. Calculate and interpret sample hypotheses:
a) One-sample - continuous outcome
b) One-sample - dichotomous outcome
c) One-sample - categorical/ ordinal outcome
d) Matched design – continuous outcome
Assigned Reading:
Textbook: Essentials of Biostatistics in
Public Health
Chapters 6 and 7
Key terms
Estimation: Process of determining a likely value for a population
parameter (e.g. mean or proportion) based on a sample.
Point Estimate: Single valued estimate of a population parameter,
such as a mean or a proportion.
Confidence Interval (CI): Range of values (e.g. likely) for a
population parameter with a level of confidence attached (e.g. 95%
confidence that the interval contains the unknown parameter).
General form for CI is:
point estimate margin of error
Common confidence levels are 90%, 95%, and 99% but,
theoretically, any level between 0% and 100% can be selected.
SECTION 3.2
Use of the z and t distributions
for calculation of confidence
intervals
For the standard normal distribution, the following is true:
P(-1.96 < z < 1.96) = 0.95
i.e. there is a 95% probability that a standard normal variable,
denoted z, will fall between -1.96 and 1.96.
Using the Central Limit Theorem, and some algebra, the 95%
confidence interval (CI) for the population mean is:
General form for a CI can be written as:
point estimate + zSE(point estimate)
where z is value from standard normal distribution reflecting the
desired confidence level, and SE=standard error of the point estimate
For the formula below for the mean (or any other parameter, we often
do not know the true value of the population standard deviation (σ)
For large sample sizes (n > 30), σ can be estimated from the sample
standard deviation (s) based on the Central Limit Theorem.
For small sample size (n < 30), the Central Limit Theorem does not
apply, and instead, the t distribution is used (Table 2 of Appendix)
•
•
•
t values depend on n
small samples have larger t value (less precision)
values are indexed by degrees of freedom (df = n-1)
Listing of Selected t Values for Confidence Intervals
Example: For a confidence interval of a mean with n < 30, use t:
df
4
5
6
7
8
9
80%
1.533
1.476
1.440
1.415
1.397
1.383
Confidence Level
90%
95%
2.132
2.015
1.943
2.776
2.571
2.447
1.895
1.860
2.365
2.306
1.833
2.262
98%
3.747
3.365
3.143
2.998
2.896
2.821
99%
4.604
4.032
3.707
3.499
3.355
3.250
SECTION 3.3
Calculation and interpretation
of confidence intervals
One Sample
a) Continuous outcome
b) Dichotomous outcome
CI for One Sample – Continuous Outcome
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean Body Mass Index (BMI)
180 (n > 30, so use large sample – z value)
28.2
5.4
95%
Z value: 1.96
95% Confidence Interval for μ:
= 28.2
1.96 x (5.4 / sqrt(180))
= 28.2
0.79
= (27.4, 29.0)
= 28.2
1.96 x (5.4 / sqrt(180))
= 28.2
0.79
= (27.4, 29.0)
95% C.I.
Lower limit
27.4
µ
28.2
Upper limit
29.0
From the sample, we estimate the mean BMI as 28.2, and are 95% confident that the
true population mean lies between the interval of 27.4 to 29.0
CI for One Sample – Continuous Outcome (Practice)
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean diastolic blood pressure
503
80.69
10.176
95% Z value: ___ or
t value: ____
95% Confidence Interval for μ:
CI for One Sample – Continuous Outcome (Practice)
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean diastolic blood pressure
503 (large sample, n > 30)
80.69
10.176
95% Z value: 1.96
95% Confidence Interval for μ:
= 80.69 + 1.96 x 10.176 / sqrt(503)
= 80.69 + 0.889
= (79.8, 81.6)
CI for One Sample – Continuous Outcome (Practice)
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean diastolic blood pressure
503 (large sample, n > 30)
80.69
10.176
95% Z value: 1.96
= 80.69 + 1.96 x 10.176 / sqrt(503)
= 80.69 + 0.889
= (79.8, 81.6)
SPSS
Analyze
Compare Means
One Sample T Test
Options: 95% confidence interval
CI for One Sample – Continuous Outcome (Practice)
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean resting pulse (beats per minute)
14
63.3
9.5
95% Z value: ___ or
t value: ____
95% Confidence Interval for μ:
CI for One Sample – Continuous Outcome (Practice)
Parameter:
Sample N:
Sample Mean:
Sample SD:
Confidence Level:
Mean resting pulse (beats per minute)
14 (small sample, n > 30)
63.3
9.5
95% t value: 2.16 (i.e. n-1)
95% Confidence Interval for μ:
= 63.3 + 2.16 x 9.5 / sqrt(14)
= 63.3 + 5.484
= (57.8, 68.8)
CI for One Sample – Dichotomous Outcome
Parameter:
Proportion of population treated for hypertension
Sample N:
3,532
(large sample, so use z value)
Sample Proportion: 0.345 (i.e. 1,219 / 3,532)
Confidence Level: 95%
Z value: 1.96
95% Confidence Interval for:
= 0.345
0.016 From the sample, we estimate the proportion of persons treated
(0.329, 0.361)
for hypertension to be 0.345, and we are 95% confident that the
true proportion lies between the interval of 0.329 to 0.361.
CI for One Sample – Dichotomous Outcome (Practice)
Parameter:
Proportion of population with diabetes
Sample N:
501
Sample Proportion: (91 / 501)
Confidence Level: 95%
95% Confidence Interval for:
= _______
Z value: _______
CI for One Sample – Dichotomous Outcome (Practice)
Parameter:
Proportion of population with diabetes
Sample N:
501
(large sample, so use z value)
Sample Proportion: (91 / 501)
= 0.1816
Confidence Level: 95%
Z value: 1.96
95% Confidence Interval for:
= 0.1816 + 0.0338
(0.148, 0.215)
From the sample, we estimate the proportion of persons with
diabetes to be 0.1816, and we are 95% confident that the
true proportion lies between the interval of 0.148 to 0.215.
SECTION 3.4
Calculation and interpretation
of confidence intervals
Two Samples – Matched
a) Continuous outcome
CI for Two Samples – Matched Continuous Outcome
 Often used for intervention studies with a pre- and post-measurement
design (e.g. before and after treatment)
 Goal is to compare the mean score before and after the intervention
 Because the sample is matched (same persons completing pre- and
post measurements), cannot use aggregate means (i.e. see below)
Subject ID
1
2
3
4
Pre
158
148
152
155
Post
132
138
158
131
Difference
-26
-10
6
-24
 Parameter of interest is the mean difference, denoted μd
 Parameter of interest is SD of the difference scores, denoted sd
CI for Two Samples – Matched Continuous Outcome
Parameter:
Sample N:
Sample SD:
Confidence Level:
Mean difference in depressive symptom scores
after taking a new drug: Xd = -12.7
100 (number of persons, not measurements)
SD of difference scores: sd = 8.9
95%
Z value: 1.96
= -12.7
1.96 x (8.9 / sqrt(100))
= -12.7 1.74
= (-14,4, -11.0)
CI for Two Samples – Matched Continuous Outcome (Practice)
Parameter:
Sample N:
Sample SD:
Confidence Level:
Mean difference in anxiety symptom scores
after psychotherapy: Xd = -14.8
52 (number of persons, not measurements)
SD of difference scores: sd = 9.6
90%
Z value: ______
CI for Two Samples – Matched Continuous Outcome (Practice)
Parameter:
Sample N:
Sample SD:
Confidence Level:
Mean difference in anxiety symptom scores
after psychotherapy: Xd = -14.8
52 (number of persons, not measurements)
SD of difference scores: sd = 9.6
90%
Z value: 1.645
= -14.8
1.645 x (9.6 / sqrt(52))
= -14.8
2.19
= (-17.0, -12.6)
From the sample, we estimate a mean difference in anxiety scores of
-14.8 after undergoing psychotherapy, and we are 90% confident that the
true proportion lies between the interval of -16.7 to -12.6.
SECTION 3.5
Calculation and interpretation
of confidence intervals
Two Samples - Independent
a) Continuous – mean difference
b) Dichotomous – risk difference
c) Dichotomous – risk ratio
d) Dichotomous – odds ratio
CI for Two Samples – Independent Continuous Outcome
 Common parameter of interest is difference in means between the
two groups, X1 and X2, and denoted for the population as: μ1 – μ2
 Since there are 2 independent groups, we also have:
n1 and n2 and s1 and s2
 If the sample variances are approximately equal, then we can “pool”
the standard deviations, s1 and s2. A typical rule of thumb to pool is:
s21 / s22 > 0.5 and s21 / s22 < 2.0
 The pooled (common) standard deviation is a weighted average:
CI for Two Samples – Independent Continuous Outcome
Parameter:
Mean difference in systolic blood pressure
between a sample of men and a sample of women
Xmen = 128.2; n1 = 1623; s1 = 17.5
Xwomen = 126.5; n2 = 1911; s2 = 20.1
Note: s21 / s22 = 0.76, so can use pooled SD (Sp)
Confidence Level:
95%
Z value: 1.96
= sqrt(359.12) = 19.0
Formula
CI for Two Samples – Independent Continuous Outcome
Parameter:
Mean difference in systolic blood pressure
between a sample of men and women
Xmen = 128.2; n1 = 1623; s1 = 17.5
Xwomen = 126.5; n2 = 1911; s2 = 20.1
Formula
= 1.7 + 1.26 = (0.44, 2.96)
CI for Two Samples – Independent Continuous Outcome (Practice)
Parameter:
Mean difference in depression scores
between a sample of men and women
Xmen = 5.77;
Xwomen = 6.86;
Note: s21 / s22 =
= _________
Assume calculation of a 95% confidence interval
n1 = 163;
n2 = 333;
s1 = 7.674
s2 = 8.714
CI for Two Samples – Independent Continuous Outcome (Practice)
Parameter:
Mean difference in depression scores
between a sample of men and women
Xmen = 5.77;
Xwomen = 6.86;
n1 = 163;
n2 = 333;
s1 = 7.674
s2 = 8.714
Note: s21 / s22 = 0.78, so can use pooled SD (Sp)
= sqrt((9540 + 25210) / 494) = 8.39
(5.77 – 6.86) + 1.96(8.39)
= -1.09
1
1
+
163 333
= (-2.66, 0.49)
CI for Two Samples – Independent Continuous Outcome (Practice)
Parameter:
Mean difference in depression scores
between a sample of men and women
Xmen = 5.77;
Xwomen = 6.86;
n1 = 163;
n2 = 333;
s1 = 7.674
s2 = 8.714
From the sample, we estimate a mean difference in depression scores between men
and women of -1.09, and we are 95% confident that the true mean difference lies
between the interval of -2.66 to 0.49.
SPSS
Analyze
Compare Means
Independent Samples T Test
Test Variable
Grouping Variable
Options – CI percentage
CI for Two Samples – Independent: Risk Difference
 Parameter of interest is the risk difference for the incidence
proportions in the population, denoted as RD = p1 – p2
 For a sample, the point estimate for the risk difference is denoted
as: RD = p1 – p2
Formula
Example: Incidence of CVD in Smokers and Non-Smokers
No CVD
CVD
Total
Incidence
Current smoker
663
81 (x1)
744
p1 = 81 / 744 = 0.1089
Non-smoker
2757
298 (x2)
3055
p2 = 298 / 3055 = 0.0975
Total
3420
379
3799
CI for Two Samples – Independent: Risk Difference
 Example: Compare the incidence proportion of CHD among
smokers (exposed) and non-smokers (not exposed)
Smokers:
n1 = 744
w/CHD(x1) = 81
p1 = 0.1089
Non-smokers: n2 = 3055
w/CHD(x2) = 298
p2 = 0.0975
Confidence Level:
95%
Z value: 1.96
= 0.0114 + 0.0247 = (-0.0133, 0.0361)
CI for Two Samples – Independent: Risk Difference (Practice)
 Example: Compare the incidence proportion of sleep disorder
among person on statins (exposed) and not on statins (not exposed)
Confidence Level: 95%
Z value: _______
Sleep OK
Sleep Dx
Total
Incidence
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
CI for Two Samples – Independent: Risk Difference (Practice)
 Example: Compare the incidence proportion of sleep disorder
among person on statins (exposed) and not on statins (not exposed)
Confidence Level: 95%
Z value: 1.96
Sleep OK
Sleep Dx
Total
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
0.1333 – 0.0705 + 1.96
Incidence
0.1333(1 – 0.1333) 0.0705(1 – 0.0705)
+
105
397
= 0.063 + 0.0697 = (-0.007, 0.133)
CI for Two Samples – Independent: Risk Difference (Practice)
 Example: Compare the incidence proportion of sleep disorder
among person on statins (exposed) and not on statins (not exposed)
Confidence Level: 95%
Z value: 1.96
Sleep OK
Sleep Dx
Total
Incidence
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
= 0.063 + 0.0697 = (-0.007, 0.133)
From the sample, we estimate that absolute risk of sleep disorder is 0.063 higher in
statin-users compared to non-users, and we are 95% confident that the true risk
difference lies between the interval of -0.007 to 0.1333.
CI for Two Samples – Independent: Risk Ratio
 Parameter of interest is the ratio of the incidence proportions for the
population, denoted as RR = p1 / p2
 For a sample, the point estimate for the risk ratio (RR) is denoted as:
RR = p1 / p2
 Note that the RR does not follow a normal distribution, but the
natural log (ln) of the RR is approximately normally distributed and
is used to calculate the confidence interval – this entails 2 steps:
-----
Calculate CI for ln(RR)
Calculate CI for RR (i.e. transform)
CI for ln(RR):
CI for (RR):
exp(Lower limit), exp(Upper limit)
CI for Two Samples – Independent: Risk Ratio
RR = p1 / p2
CI for ln(RR):
CI for (RR):
exp(Lower limit), exp(Upper limit)
 Example: Compare future risk of CHD among smokers (exposed)
and non-smokers (not exposed)
Smokers:
n1 = 744
w/CHD(x1) = 81 p1 = 0.1089
Non-smokers: n2 = 3055
w/CHD(x2) = 298 p2 = 0.0975
Confidence Level:
95%
Z value: 1.96
RR = p1 / p2 = 0.1089 / 0.0975 = 1.12
CI for ln(RR):
= 0.113 + 0.232 = (-0.119, 0.345)
(exp(-0.119), exp(0.345)) = (0.89, 1.41)
CI for Two Samples – Independent: Risk Ratio (Practice)
RR = p1 / p2
CI for ln(RR):
CI for (RR):
exp(Lower limit), exp(Upper limit)
 Example: Compare the future risk of sleep disorder among statin
users (exposed) versus non-statin users (not exposed)
Confidence Level: 95%
Z value: _______
Sleep OK
Sleep Dx
Total
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
RR = p1 / p2 =
CI for ln(RR):
Incidence
CI for Two Samples – Independent: Risk Ratio (Practice)
RR = p1 / p2
CI for ln(RR):
CI for (RR):
exp(Lower limit), exp(Upper limit)
 Example: Compare the future risk of sleep disorder among statin
users (exposed) versus non-statin users (not exposed)
Confidence Level: 95%
Z value: 1.96
Sleep OK
Sleep Dx
Total
Incidence
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
RR = p1 / p2 = 0.1333 / 0.0705 = 1.89
CI for ln(RR):
= 0.6366 + 0.6044 = (0.0322, 0.6044)
(exp(0.0322), exp(1.24)) = (1.03, 3.46)
CI for Two Samples – Independent: Risk Ratio (Practice)
 Example: Compare the future risk of sleep disorder among statin
users (exposed) versus non-statin users (not exposed)
Confidence Level: 95%
Z value: 1.96
Sleep OK
Sleep Dx
Total
Incidence
Statin user
91
14 (x1)
105
p1 = 14 / 105 = 0.1333
Non-statin user
369
28 (x2)
397
p2 = 28 / 397 = 0.0705
Total
460
42
502
RR = p1 / p2 = 0.1333 / 0.0705 = 1.89
CI for ln(RR): = 0.6366 + 0.6044 = (0.0322, 0.6044)
(exp(0.0322), exp(1.24)) = (1.03, 3.46)
From the sample, we estimate that risk of sleep disorder is 1.89 times higher in
statin-users compared to non-users, and we are 95% confident that the true risk lies
between the interval of 1.03 to 3.46.
CI for Two Samples – Independent: Odds Ratio
 Conceptually similar to risk ratio, yet the parameter of interest is the
odds ratio (OR), defined as:
Odds of exposure among cases / Odds of exposure among controls
Example: Prevalence of CVD in Smokers and Non-Smokers (95% C.I.)
CVD (D+)
No-CVD (D-)
Current smoker(E+)
81
663
Non-smoker (E-)
298
2757
OR = (81 / 298) / (663 / 2757) = 1.13
Cases
Controls
Exposed
a
b
Not exposed
c
d
Z = 1.96
CI for ln(OR):
= 0.122 + 0.260 = (-0.138, 0.382)
(exp(-0.138), exp(0.382)) = (0.87, 1.47)
CI for Two Samples – Independent: Odds Ratio (Practice)
OR = Odds of exposure among cases / Odds of exposure among controls
Prevalence of Sleep Disorder Among Statin and Non-Statin Users (95% C.I.)
Sleep Dx
Sleep OK
Statin user (E+)
14
91
Non-statin user (E-)
28
369
OR = (a / c) / (b / d) = _________
Cases
Controls
Exposed
a
b
Not exposed
c
d
Z = ___________
CI for ln(OR):
CI for (OR): exp(Lower limit), exp(Upper limit)
CI for Two Samples – Independent: Odds Ratio (Practice)
OR = Odds of exposure among cases / Odds of exposure among controls
Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users
Sleep Dx
Sleep OK
Statin user (E+)
14
91
Non-statin user (E-)
28
369
OR = (14 / 28) / (91 / 369) = 2.027
CI for ln(OR):
= 0.7066 + 0.6813 = (0.0253, 1.3879)
(exp(0.0253), exp(1.3879)) = (1.03, 4.01)
Cases
Controls
Exposed
a
b
Not exposed
c
d
Z = 1.96
CI for Two Samples – Independent: Odds Ratio (Practice)
OR = Odds of exposure among cases / Odds of exposure among controls
Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users
Sleep Dx
Sleep OK
Statin user (E+)
14
91
Non-statin user (E-)
28
369
Cases
Controls
Exposed
a
b
Not exposed
c
d
OR = (14 / 28) / (91 / 369) = 2.027
= 0.7066 + 0.6813 = (0.0253, 1.3879)
(exp(0.0253), exp(1.3879)) = (1.03, 4.01)
From the sample, we estimate that the odds of statin use among persons with sleep
disorder are 2.03 times higher that the odds of statin-use among persons without
sleep disorder, and we are 95% confident that the value lies between the interval of
1.03 to 4.01.
SECTION 3.6
Use of SPSS to calculate
confidence intervals
CI for Two Samples – Independent: Odds Ratio (Practice)
Example: Prevalence of Sleep Disorder Among Statin and Non-Statin Users
Sleep Dx
Sleep OK
Statin user (E+)
14
91
Non-statin user (E-)
28
369
Cases
Controls
Exposed
a
b
Not exposed
c
d
OR = (14 / 28) / (91 / 369) = 2.027
= 0.7066 + 0.6813 = (0.0253, 1.3879)
(exp(0.0253), exp(1.3879)) = (1.03, 4.01)
SPSS
Analyze
Descriptive Statistics
Crosstabs
Row and Column Variable
Statistics (check “Risk”)
Sleep Dx
Sleep OK
Statin user (E+)
14
91
Non-statin user (E-)
28
369
OR = 2.027
95% C.I. = 1.04, 4.01
1.0
Null value
OR = 2.03
0
Bounded at 0
Lower limit
1.04
Upper limit
4.01
10
Unbounded
Note:
The confidence interval for a continuous variable such as mean or difference in mean is
symmetric around the point estimate.
In contrast, for the risk ratio and odds ratio, the confidence interval is skewed to the
right of the point estimate: This is because:
a) Values for RR and OR have a lower bound of 0 yet no upper bound
b) The C.I. formulas are based on an exponential function
SECTION 3.7
Relationship between the risk
ratio and the odds ratio
Odds Ratio & Risk Ratio
Relationship between RR and OR:
The odds ratio will provide a good estimate of the
risk ratio when:
1. The outcome (disease) is rare
OR
2. The effect size is small or modest
Odds Ratio & Risk Ratio
The odds ratio will provide a good estimate of the
risk ratio when:
1. The outcome (disease) is rare
E+
E-
D+
a
c
Db
d
a / (a +b )
RR = -----------c / (c +d)
If the disease is rare, then
cells (a) and (c) will be small
a / (a +b ) a / b ad
OR = (a / c) / (b / d)
RR = ------------ = ------ = --- = OR
OR = (ad) / (bc)
c / (c +d) c / d
bc
Odds Ratio & Risk Ratio
The odds ratio will provide a good estimate of the
risk ratio when:
2. The effect size is small or modest.
E+
E-
D+
40
120
D60
180
(40 / 120) 0.333
OR = ------------ = ------- = 1.0
(60 / 180) 0.333
40 / (40 + 60)
RR = -------------------120 / 120 + 180)
0.40
------ = 1.0
0.40
Odds Ratio & Risk Ratio
Finally, we expect the risk ratio to be closer to the null
value of 1.0 than the odds ratio. Therefore, be especially
cautious when interpreting the odds ratio as a measure
of relative risk when the outcome is not rare and the
effect size is large.
(20 / 10)
2.0
OR = ------------ = ------- = 6.0
D+
D(30 / 90) 0.333
E+
20
30
(20 / 50)
0.40
E10
90
RR = ------------ = ------- = 4.0
(10 / 100) 0.10
Download