Önder Ergönül, MD, MPH
Koç University, School of Medicine
Summer Course on Research Methodology in Health Sciences
Julne 16, 2015, Istanbul
National Case-Control Study of Kaposi’s Sarcoma and Pneumocystis carinii Pneumonia in Homosexual Men
Jaffe HWf, et al. Ann Int Med 1983; 99: 145-151.
Objective
To identify risk factors for the occurrence of Kaposi's sarcoma and
Pneumocystis carinii pneumonia in homosexual men, we conducted a case-control study in New York City, San Francisco, Los Angeles, and
Atlanta.
Methods
Fifty patients (cases) (39 with Kaposi's sarcoma, 8 with pneumocystis pneumonia, and 3 with both) and 120 matched homosexual male controls
(from sexually transmitted disease clinics and private medical practices)
Results a larger number of male sex partners
Cases were also more likely to have been exposed to feces during sex, have had syphilis and non-B hepatitis, have been treated for enteric parasites, and have used various illicit substances.
Italians (HLA-DR5)
Francoise Barre-Sinoussi Luc Montagnier Harald zur Hausen
• Purpose: to identify or designate a group of people (cohort) and follow them over time
– Exposure is typically measured prior to the onset of disease
Population
Cohort
Exposure
Assessment 1 ……
Exposure
Assessment N
Disease
Assessment
• Cohort Studies are inefficient in the event of:
1. Significant time between exposure and measurable disease outcome
Exposure
Assessment
2. Rare outcomes
Many, many, many years
Initial Cohort
N=5,000
Exposure
Assessment
Diseased Group
Identified
N=30
Disease
Appearance
• In these cases, cohort studies are often expensive
1. Define source population: group that yielded the
cases (people with the disease outcome)
2. Determine which cases were/were not exposed to the variable(s) of interest
3. Sample control group from the source population
• Controls must be sampled independently of exposure
4. Determine which members of the control group were/were not exposed to the variables of interest
5. Compare rates of exposure in the case and control groups
1. Define source population: group that yielded the cases
2. Sample cases
3. Sample control group (from the source population)
4. Determine exposure for all participants
5. Compare rates of exposure in cases vs. controls
– Compare using statistics
– Odds ratios, chi-square, etc.
• Controls provide (an estimate of) the background rate of exposure
1) Controls must be representative of case population
2) Control selection must be independent of exposure
Source Population
= exposed = unexposed
Control Group:
Should contain representative exposure distribution
Cases
Selection of Controls
Controls must be representative source population
• Example: Is a new blood pressure medication associated with myocardial infarction?
– Cases – patients with MI from the cardiology ward of hospital
– Controls – patients without MI from the ER ward of the same hospital
• Problems?
– Cardiology ward is referral center for whole state but ER serves mostly the local city
– BP medication may be available to local city but not the rest of the rural state
• Solutions:
1. Choose controls from the whole state OR
2. Exclude all cases who live outside the city
Selection of Controls
Control selection must be independent of exposure
• Example: Is a new blood pressure medication associated with myocardial infarction?
• Problem:
– What if the drug slows reaction time, causing automobile accidents that lead to ER visits?
– Control sample is not independent of exposure
– Bias is introduced – control group has higher proportion of individuals exposed to the new drug
In density case-control studies, control group sampled to represent person-time distribution of total exposed and unexposed cohorts
As such (in brief):
• ‘ Risk set approach’- select controls from people in the source population who are at risk of becoming a case at the exact time a case is diagnosed
• P (selection into control group) should be proportional to person-time contribution
• e.g. a person at risk for disease for 3 years should have a 3 times higher probability of being selected as a control than a person at risk for 1 year
• Controls should be eligible to become cases both at the time of selection, and throughout their time in the study
• Each case in a C-C study should have been eligible to be a control
• Cases come from clearly defined group
• Controls sampled from this population
• Most feasible in population registry cases
• If group cannot be fully identified (enumerated)
• Involves sampling homes
• Often employs ‘matched sample’ design
Hospitals or Clinics
• Cases are people who received treatment at clinic X when they got the disease
• Controls are people who would be treated at clinic X if they got the disease
• May be difficult to identify this group!
• e.g. How do you define the population who would seek treatment at a regional medical center?
• How might you define and seek such controls?
Telephone Recruiting
• Limit case eligibility to people who have telephones in a specific area
• Randomly calling telephones in that area can approximate a random sample of the source population
– Use two separate control groups (e.g., hospital and community controls) and if results are the same, supports validity
OR
– Select 1 large, well-reasoned control group and invest
$$ and effort in this group.
• If costs are similar for cases and controls:
– Most efficient: roughly equal numbers
• If number of cases is small and cannot be increased:
– Increasing the number of controls up to a ratio of 4 controls: 1 case improves power of the study
– Beyond 4:1 little power increase
– Increasing number of controls narrows the confidence interval around the odds ratio, but DOES NOT address validity
• There are many methods of sampling controls, density-based sampling is one of the most common
• Comes from the term incidence density (incidence rate)
• For a given period of time:
For the exposed: For the unexposed:
I
1
= a / PT
1
I
0
= b / PT
0
Where I
1
, a, and PT
1 are the incidence rate, number of cases, and total person time accumulated in the exposed group (symbols are equivalent for the unexposed group)
• Goal of density-based sampling: estimate the contribution of the exposed and unexposed cohorts in the source population to total person time
– The odds ratio should estimate the incidence rate ratio c / d
PT
1
/ PT
0
Where: c = number of exposed people in control sample d= number of unexposed people in control sample
• Accordingly
I
1
I
0
a
PT
1 b
PT
0
a b
PT
0
PT
1
a b
d c
Fundamental assumption: d c
PT
0
PT
1
As such, accurately identifying the source population and taking a random sample is essential
A. Skin cancer cases
B. Cohort findings (person-years)
C. Control group (people)
D. Rate (cases/10,000 person years)
Exposure: Severe sun burn
Yes (%) No (%) Total
87 (75.0) 29 (25.0) 116
53,697 (59.6) 36,348 (40.4) 90,045
591 (59.1)
16.2
409 (40.9)
8.0
1,000
12.9
OR
87
409
29
591
2 .
08 |OR – IRR| = 0.05
Increasing sample size reduces standard error. Standard error estimates the difference between a sample statistic (e.g. proportion of population exposed) and a true value (actual proportion exposed). The quality of a C-C estimate depends on the quality of the control sample!
L
log( a
d b
c
)
log(
87
29
409
591
)
0 .
73
SE
1 a
1
b
1
c
1 d
1
87
1
29
1
591
1
409
0 .
22
95 % CI for OR
[exp( L
1 .
96 ( SE )), exp(L
1.96(SE))]
[exp(0.73
1.96(0.22) ), exp(0.73
1.96(0.22) )
( 1 .
35 , 3 .
20 )
How do we interpret this? Can we say that these data suggest severe sun burn is a risk factor for skin cancer?
25
• Recall bias: cases remember exposures more often than controls
• Example: study of oral contraceptive use and breast cancer
– Women with breast cancer have greater reason to search their memories for exposures that might be associated with their disease
– The cases are more likely to remember oral contraceptive use than controls
– Bias – over-estimates the risk
• Interviewer bias: Data gatherers should be blind to case status
– If not possible, keep interviewers blind to main hypothesis
• Problem: If interviewers are not blind, they may
(inadvertently) elicit information differently.
– Spend more time data gathering
– Ask more follow-up questions
• Memory aids
(Photographs, diaries, timelines etc.)
• Example: study of oral contraceptive use
– photographs of pills with names
– timeline to recall major life events.
• The term typically refers to a CC nested within a clearly defined cohort (e.g. the Nurses’ Health Study or the
ELSPAC groups)
• Rationale:
– You may want more data (specific to your outcome) than is available
– Too expensive to get new information from everyone in the cohort
– Instead, sample controls randomly from the cohort and only collect relevant data from them
• A good (and common) opportunity to employed
matched CC designs
• Crossover studies: an experiment in which two interventions are compared; each subject acts as their own control
– Each subject receives both interventions
• The effects of each intervention are compared for every subject
• There needs to be enough time between interventions to see effects
Intervention 1
Measure effect 1
Intervention 2
Measure effect 2
• Case-Crossover Design: all subjects are cases
• Controls are not different people- they are a sample of the cases’ time before disease
• Useful when considering short-term ‘trigger effects’
– e.g. coffee and asthma attack
– H
0
= coffee elevates risk of asthma attack for 1 hr
– ‘Case time’- one hour periods after drinking coffee
– ‘Control time’- a sample of 1 hour periods not after drinking coffee
– Calculate IRR
• Matching is employed because:
– It addresses confounding in the design phase of a study
– It improves the efficiency of the analysis
• Example of confounding
Cigarette smoking
Heart
Attack
Age
• What happens if cases are older than controls? Bias
• Matched analyses are more difficult
• If matching variable are ‘important’ (related to your variable of interest), they should be included in your analysis
– If matching is used simply to structure sampling
(e.g. the neighborhood recruitment example) , and the matching variable are not important, they do not have to be included
• Odds ratios can stand alone only in the absence of confounding and modifying variables: very rare!
• The multivariate extension of a non-matched case control study is logistic regression log(
1
P ( case )
P ( case )
)
0
1
( sunburn )
2
( latitude )
3
( skintone )
4
( skintone * sunburn )
Odds (log odds) associated with sunburn
Controlling for potential confounder
Allowing effect of sunburn to vary between people with different skin tones
Cohort Study
Complete source population
Case-Control Study
Sampling from source population
Very expensive Less expensive
Convenient for studying many diseases Convenient for studying many exposures
Can calculate incidence rates/risks and their differences/ratios
Can be retrospective or prospective
Can usually only calculate the ratio of rates or risks
Can be retrospective or prospective
cohort case-control
Complete exposure information;
Less bias for exposure
Recall and selection bias
Can examine temporal relation Not always
Study multiple outcomes
Incidence rates and RR
Results easy to understand, straightforward
Only one outcome
OR
More difficult interpretation
cohort
Inefficient for rare disease, unless the attributable risk percent is high
If prospective, extremely expensive and time consuming
If retrospective, requires the availability of the records
Validity of the results can be seriously affected by losses to follow-up case-control
Optimal for rare disease
Cheaper and quick