Cases - SACEMA

advertisement
Advanced Epi
August 15-19th 2011
SACEMA
Matthew Fox
Boston University
Center for Global Health and Development
Department of Epidemiology
Health Economics and Epidemiology Research Office
mfox@bu.edu
Introductions
Who are you?
 Where do you work/study?
 What do you study?

Welcome


About me
Week long short course on epi methods
2
Sessions/day each about 3 hours (depending)
 Assumes intro/intermediate epi, practical experience
with epi and stats

Mix of lecture and discussion
 Too


much material, take good notes, go back to them
Finish mid-day on Friday
Course works if you read and participate
Course Overview

Review basic epidemiologic principles
 Reinterpret

them in a new light
Think through problems/implications of
what we learned in intro/intermed epi
 Develop
a causal framework(s) to hang our
epidemiologic thinking

Learn/apply advanced epi methods
Modern Epidemiology III
Questions for Today


What is epidemiology, what is its
goal?
What are measures of association and
measures of effect?

What do these measures really mean?
 Which ones have causal meanings?
 What is the odds ratio really about
 Why does everyone use it?
The goal of epidemiologic
research

Epidemiology is study of:
 The
distribution and determinants of disease in
human populations and the application of that
knowledge to the control of disease

But the goal is:
 To
obtain a valid and precise (and generalizable)
estimate of the effect of an exposure on a disease

Validity is the opposite of bias, precision is the opposite of
random error
 Fundamentally
concerned with measurement
Anyone remember
Type I and Type II
error?
What are they?
Basic Statistics
Truth about Null
Effect
Effect
No effect
Correct
Type I error
(alpha)
Our
study
null
No effect Type II error
(beta)
Correct
Type I: If we reject the null, what are the chance there is no effect?
Type II: If we fail to reject the null, what are the chances there is an effect?
How do we know a particular
epidemiologic finding is true?
Find that the relative risk of exposure
to vitamin # on cancer @ is 2.5, p=0.049
 Assume we did the perfect study

 No
bias (confounding, selection, information)
 80% power, alpha = 0.05

What is chance there is really no effect
of vitamins on cancer?
 i.e.
True relative risk is 1
Syphilis testing in the US

In US pre-2005, Massachusetts required
a syphilis test before marriage
 Assume


the test was:
95% sensitive and 95% specific
If I test positive, how likely is it that I
truly have syphilis?
 Answer
is that it depends
Syphilis
Se = 95%
Sp = 95%
Truth
-
Total
95
495
590
5
9405
9410
9900
Prevalence is:
10,000
1%
+
+
Test Total
100
PPV = 16%
Back to our study
Truth
Effect
Our
study
No effect
Effect
No effect
Correct
Type I error
(alpha)
Type II error
(beta)
Correct
Alpha and beta use the TRUTH as the denominator and so are like Se and Sp
Back to our study
Truth
Effect
Our
study
No effect
Effect
No effect
Correct
Type I error
(alpha)
Type II error
(beta)
Correct
Judging the “correctness” of a single study is the PPV, and depends of the
prevalence of true hypotheses
Back to our study
alpha = 5%, (Sp 95%)
beta = 5%, (Se 95%)
Truth
-
Total
950
450
1400
50
8550
8600
+
+
Our
Study -
68% chance our
study is right
Prevalence of true
Total
1000
9000
10,000
hypotheses is:
10%
Take home message:
We need to critically
examine the way we
have been taught to
design and interpret
epidemiologic research
Review of basic
concepts
Study design, measures of
disease frequency, measures
of effect/association
The Source Population
The population that gives rise to cases
 It is defined:

 In
time and place
 With respect to population characteristics
 With respect to external influences (modifiers)
 Not as a sample of the general population
Cohorts

Membership in a cohort requires a person
meet admissibility criteria
 Have

common admissibility-defining events
Membership begins once the temporally last
criterion is met
 Once
a member, a person never leaves (membership
is static or closed)
 A closed cohort adds no new members and loses
only to death, an open cohort is adding new members
Dynamic population

Membership requires a person satisfy
the membership status criteria
 They
have common admissibility-defining
characteristics
Membership exists so long as all of the
status criteria are satisfied
 A person can enter a dynamic
population, leave it, and then re-enter

Cohorts vs. Dynamic Populations

Framingham heart study
– the admissibility criteria are
enrolling in the study in 1948. Never leave the
cohort once you enroll.
 Dynamic population – could have instead
studied all residents of Framingham from
1948 onwards, the catchment population for a
case registry there. Some will leave, new
people will join.
 Cohort
STUDY DESIGN: How to harvest
information from the base


Census (cohort) or Sample (case-control)
Cases are valuable (information rich)
 In

SE calcs, these drive your standard error
Ex. SE(LN(RR)) = sqrt(1/A–1/N1+1/B–1/N0)
 Include

all the cases in the population
Information density of population that gave
rise to cases is not great
 Can
include all or sample
 Nearly all base’s info is harvested when sample of
base is small multiple of the cases
Which is the best
measure to assess
causal effects?
1) Risk Difference
2) Risk Ratio
3) Odds Ratio
In a case-control study,
from what population do
we sample controls?
1)
2)
3)
Those with disease
Those without disease
Everyone, regardless of whether
they have the disease
Cohort Study
Case-control Study
Kramer and Bovin 1987

We define a cohort study as a study in
which subjects are followed forward from
exposure to outcome… Inferential
reasoning is from cause to effect. In casecontrol studies, the directionality is the
reverse. Study subjects are investigated
backwards from outcome to exposure, and
the reasoning is from effect to cause.”
Cohort Study: Relative Risks

Index (E+)
Reference (E-)
Cases
A
B
Non-cases
C
D
Total
N1
N0
Relative risk:
 Risk
(A/N1) / (B/N0)
in exposed / risk in unexposed
 Risk is number of cases / total at risk
 Numerator is number of cases
 Denominator is cases and controls!
Cohort Concept
Exposed Cases
A
NE+
C (NE+ - a)
NE-
t
D (NE- - b)
t0
Unexposed Cases
B
Cohort Study: Relative Risks

Index (E+)
Reference (E-)
Cases
A
B
Non-cases
C
D
Total
N1
N0
Relative risk:
 (A/N1)/(B/N0)
can be rearranged as (A/B)/(N1/N0)
 A/B is ratio of exposed to unexposed cases
 N1/N0 is ratio of exposed to unexposed in population
Relative risk has meaning:
average increase in risk
produced by exposure
Case-control: Cases

Members of population who develop
disease over the follow-up period
 Same
cases as the analogous cohort study
 Case ascertainment is influenced by design
Primary base: population defined first
 Secondary base: cases defined first

Case-control: Controls
A sample of the population experience
that gave rise to the cases
 3 options (paradigms)

 Un-diseased
experience
 Population at risk at beginning of the study
 Population experience over follow-up
Cases
Non-cases
0 mos 6 mos 12 mos 18 mos 24 mos
0
5
10
15
20
100
95
90
85
80
Case-control Concept
Option 2:
Case-cohort
Exposed
Cases
A
Option 1:
Cumulative
NE+
C (NE+ - a)
NE-
t
D (NE- - b)
t0
Option 3:
Density Sampling
Unexposed
Cases
B
Case-control study


Index
Reference
Cases
A
B
Controls
C
D
Now we can’t estimate risk A/N1 and B/N0
because we don’t know the denominators
Left with an odds ratio
 But
how to interpret?
2 ways to calculate an OR

Index
Reference
Cases
A
B
Controls
C
D
Cross product ratio:
 (A*D)/(B*C)
 Not
particularly meaningful, but it works
2 ways to calculate an OR

Index
Reference
Cases
A
B
Controls
C
D
Case ratio/base ratio:
 (A/B)
/ (C/D)
 A/B is the ratio of exposed to unexposed cases
 C/D is the ratio of exposed to unexposed controls
 Remember back to Relative Risk

Here C/D fills in for N1/N0
The trohoc fallacy
Index
Reference
Cases
400
100
Non-cases
600
Total
1000
Index
Reference
Cases
400
100
900
Non-cases
60
90
1000
Total
Not
sampled
RR = (400/1000) / (100/1000)
= 4.0


10% sample of
non-cases
OR = (400/60) / (100/90)
= 6.0
The trohoc fallacy is idea that a case-control
study is a cohort study done backwards
(heteropalindrome)
Requires a rare disease assumption for the
odds ratio to approximate the relative risk
Case-control Concept
Option 2:
Case-cohort
Exposed
Cases
A
Option 1:
Cumulative
NE+
C (NE+ - a)
NE-
t
D (NE- - b)
t0
Unexposed
Cases
B
10% sample of population that
gave rise to cases
The trohoc fallacy revealed
Index
Reference
Index
Reference
Cases
400
100
Cases
400
100
Non-cases
600
900
Non-cases
Not
sampled
Total
1000
1000
Controls
100
100
RR = (400/1000) / (100/1000)
= 4.0

Sample total population that gave rise to
cases (which includes cases), not
undiseased at end
 Cases

OR = (400/100) / (100/100)
= 4.0
can be their own controls if randomly sampled
Requires no rare disease assumption
Miettinen on the trohoc fallacy


“Consider the clinical trial: the concern is, as
always, to contrast categories of treatment as to
subsequent occurrence of some outcome
phenomenon, whereas comparing different
categories of the outcome as to the antecedent
distribution of treatment is uninteresting if not
downright perverse.”
Preferred terms like “case-referent” and “casebase” studies as “the base sample is no more a
control series than a census of the base is”
Why it works

OR = [A*D] / [B*C]
= [A/B] / [C/D]
 If
we sample 10% of the
Cases
base then the odds ratio is:
OR =
Non[A/B] /[(10%*N1)/(10%*N0)] case
 = [A/B]/(N1/N0) = RR
Total

Index
Ref
A
B
C
D
N1
N0
Cohort studies exclude those who
are not at risk for disease (though
they don’t need to). In a case
control study. Should we exclude
those not at risk for exposure?
Ex. In a study of hormonal
contraception and heart disease,
should we exclude nuns?
With appropriate sampling,
odds ratio is interpreted as
estimate of relative risk,
which has meaning.
Case control studies are
cohort studies done
efficiently, not cohort studies
done backwards.
Measures of Disease Frequency

Provide an estimate of the occurrence
of disease in a population
 Typically
we study first occurrence as later
occurrences are often affected by first

Incorporates:
 Disease
state
 Time
 Population
definition
Measures of Disease Frequency

Prevalence:
 Proportion
of population with disease at a
particular time
 Cross-sectional
 Reflects rate of disease occurrence and
survival with disease
Measures of Disease Frequency

Cumulative Incidence (Simple)
 Proportion
of a population that develops
disease over a follow-up period
 Also called incidence proportion or risk
 Bounded by 0 and 1
 Time not part of measure but must report
 Difficult to measure in dynamic populations
CI(t0,t) = I(t0,t)/N0
Measures of Disease Frequency

Incidence rate (density)
 Number
of newly developed cases divided by
accumulated person time

Time is part of the denominator
 Can
be used in dynamic populations/cohorts
 Ignores distinction between individuals

(2/100 py could be 2 followed 50 yrs each, both
get event or 100 followed 1 yr each, 2 get event)
N
IR(t ,t) = I(t0,t) /∑PT
0
where
PT 
 t or PT  Nt
i
i 1
Measures of Disease Frequency

Rules for counting person time
 Start
disease free, free of history of disease at entry
 At risk for outcome? Not necessary, but wasteful
 Start after exposure is complete (not during) and after
minimum induction period
 Stop when disease occurs (date or midpoint)
 Stop if withdrawn (lost to follow up, death from
another cause, study ends, no longer at risk)

Only those eligible to be counted in
numerator are in denominator
 Ask,
if became a case, would I have counted them?
Person Time Issues I

We conduct a cohort study of
continuous smoking vs. no smoking
and prostate cancer
 Enroll

1000 smokers and 1000 non-smokers
At end, find 100 non-smokers became
smokers. Should we exclude them?
 Can’t
because if they became cases while not
smoking we would have included them
Person Time Issues II

Study HAART regimens and death
 But
much death and LTFU in first 6-months
and we care about long term mortality

Exclude any deaths in first 6-months
 OK

if all we care about is long-term effects
When should person time start?
 Immortal
person-time biases towards null
Black triangle
Prevalence =
2/8 = 0.25
Black triangle
Cum Inc =
2/9
5
5
5
5
5
Black triangle
5
Inc Rate =
2/42
2
5
5
Measure of Effect

Comparison of occurrence of outcome in the
same population at same time under two
different conditions
 Only
one can be observed
 Second is “counterfactual” (we will come back to this)

Theoretical, as such we substitute measure
of association
 But
as an approximation to measure of effect
Measures of Association


Comparison of incidence in 2+ populations
Relative:
 Comparison
by division
 Null (no effect) is 1
 Log scale (distance from 0-1 is same as 1 to infinity)

Difference:
 Comparison
by subtraction
 Null (no effect) is 0
 Distance above and below null is equivalent
Calculations
RD  CI E  CI E IRD  IRE  IRE
CI E
RR 
CI E
IRE
IRR 
IRE
Conclusion


Objective is a VALID and PRECISE estimate
of the effect of an exposure on an outcome
Need to think critically about the logic of the
methods we have been taught
 Make
sure we understand how to validly design
studies and how to correctly interpret study findings

Odds ratios are odd
 Correct
sampling means can reduce reliance on them
Download