Exercises to Epidemiology course

advertisement
X1 Exercises
Hein Stigum
1.
Precision and validity
POPs and birth weight Study
Persistent organic pollutants (POPs) are chemical substances that persist in the environment,
bioaccumulate through the food web, and pose a risk of causing adverse effects to human
health and the environment. This group of pollutants consists of pesticides (such as DDT),
industrial chemicals (such as polychlorinated biphenyls, PCBs) and unintentional by-products
of industrial processes (such as dioxins and furans).
Animals accumulate POPs in fat through their food; concentrations increase at each step in
the food chain. Humans are exposed to POPs through food, mainly from fat fish. Breast fed
babies are exposed through mothers milk. Some POPs have toxic effects to animals, affecting
development, reproduction, the immune system and the uterus.
You want to study the effect of Persistent organic pollutants (POPs) on birth weight. You
recruited a random sample of 400 women from pregnancy controls. The mothers sampled
milk after birth, and the levels of different POPs in the milk were analyzed. You now have a
file with these levels, the birth weight of the babies, information on possible problems during
pregnancy, and the usual list of background variables: age, education, …
a)
Describe why random errors occur in the study. Where do the effects of random error
turn up in the analysis? How much do you need to increase sample size to double the
precision? (Hint: the precision is proportional to the square rot of the sample size).
b)
c)
Describe some sources of systematic errors in the study.
You have estimated the crude association between the exposure (one of the POPs) and
the outcome (birth weight) in a regression model. You add all the background
variables in one block to the model and see some change in the exposure-outcome
association. A colleague tells you that you must test the two models against each other
with a likelihood ratio test to see if the confounding is real. What is your view on this?
2.
Causation
Below are two statements on causation. Discuss these.
a) In a cohort study the exposed and unexposed groups should be as equal as possible,
except for exposure, for us to draw conclusions on cause.
b) Likewise, in a case-control study the cases and the controls should be as equal as
possible, except for disease, for us to draw conclusions on cause.
3.
Additive vs. multiplicative scale
In a study on the effect of depression on mortality you get the following (hypothetical) results:
Young
depressed
not depressed
Dead
5
2
Alive
195
798
200
800
Dead
20
20
Alive
180
780
200
800
Old
depressed
not depressed
You calculate the risk of death for depressed versus not depressed, and find a 10 times higher
death risk if you are depressed in the young group, and a 4 times higher death risk if you are
depressed in the old group. You conclude that: “The effect of depression on mortality
decreases with increasing age”.
Some funny bloke tells you to work on the additive scale instead. Under much doubt you
calculate the risk differences. Now what is your conclusion?
4.
2 by 2 tables
a) You are planning a cohort study about the effect of organic food consumption on low birth
weight. Your funding allows 3000 subjects. You expect 10% with low birth weight, you
assume that 3% of the population use organic food and you want to detect a 30% protection
from organic food (that is an OR=0.7). Below is the calculation of expected precision (at 80%
power).
Cohort allocation 1
Values
3 000
10.0 %
3.0 %
0.70
95 %
80 %
Number of subjects, N=
Proportion with disease=
Proportion exposed=
Expected OR=
Confidence level:
Power:
Names
low birth weight
organic food comsuption
low birth weight
+
+
organic food comsuption
-
7
293
83
2 617
Proportion exposed=
300
2.2 %
2700
3.1 %
Decimals=
2
2
90
2 910
3000
3.0 %
Proportion with disease
7.3 %
10.1 %
10.0 %
OR=0.7, 95% CI=(0.22 , 2.21)
RR=0.72, 95% CI=(0.25 , 2.09)
The results are not very promising, why?
Below is a different allocation of the same number of subjects. What have we changed, and
what is achieved?
Cohort allocation 2
Values
3 000
10.0 %
50.0 %
0.70
95 %
80 %
Number of subjects, N=
Proportion with disease=
Proportion exposed=
Expected OR=
Confidence level:
Power:
Names
low birth weight
organic food comsuption
low birth weight
+
organic food comsuption
+
-
Proportion exposed=
Decimals=
126
174
1 374
1 326
300
42.0 %
2700
50.9 %
Proportion with disease
1 500
8.4 %
1 500
11.6 %
3000
10.0 %
50.0 %
OR=0.7, 95% CI=(0.5 , 0.99)
RR=0.73, 95% CI=(0.53 , 0.99)
2
2
b) The next example is a case-control study of the effect of lack of the mineral selen in the
food on the risk of developing tuberculosis. Two different ways of allocating resources are
shown. Discuss pro et con.
Case-control allocation 1
Cases:
Controls per case:
Proportion of cases exposed:
Proportion of controls exposed:
Expected OR:
Confidence level:
Power:
Values:
Names:
35 TB
1
50.0 %
<<5.00
<95 %
80 %
low Selen
Choose one of the numbers to be
calculated by the program, and
write in values for the other two
The proportion of controls exposed
is calculated by the program
TB
low Selen
+
-
sum
Proportion exposed=
Decimals=
2
Cases
Controls
18
18
6
29
35
50.0 %
35
16.7 %
sum
23
47
70
33.3 %
OR=5, 95% CI=(1.02 , 24.39), power=80%
Case-control allocation 2
Cases:
Controls per case:
Proportion of cases exposed:
Proportion of controls exposed:
Expected OR:
Confidence level:
Power:
Values:
Names:
20 TB
3
50.0 %
<<5.00
<95 %
80 %
low Selen
Choose one of the numbers to be
calculated by the program, and
write in values for the other two
The proportion of controls exposed
is calculated by the program
TB
low Selen
+
-
sum
Proportion exposed=
Decimals=
5.
Cases
Controls
10
10
10
50
20
50.0 %
60
16.7 %
2
sum
20
60
80
25.0 %
OR=5, 95% CI=(1.02 , 24.39), power=80%
Sampling
c) In the persistent organic pollutants (POPs) and birth weight study your main suspect
for causing low birth weight is a pesticide. Only a few women in your study have high
levels of this pesticide, giving you a low variance in the exposure and hence a low
power. You decide to enrich your sample with 50 pregnant women from a
geographical area were the pesticide is in common use. What effects will this have on
your study, how will you analyze, and what regression model will you use?
d) The pesticide analyzed in a) was not responsible for low birth weight. You now have a
set of 5 to 6 POPs that are your exposure candidates. These all suffer from the same
problem as before: only a few women with high values, that is low variation in the
exposure. You again want to enrich the sample. You reason that it may be smarter to
oversample cases of low birth weight; this will automatically give you more variation
in the actual exposure (or exposures). You enrich your sample with 50 women taken
from the medical birth registry with low birth weight babies. What effects will this
have on your study, how will you analyze, and what regression model will you use?
6.
Generalizing results
In a randomized controlled trial the researchers recruited a random sample of 400 subjects
with risk of hypertension from an area of low socioeconomic status in the US. The patients
were randomized to placebo or drug treatment, but the compliance was low; only 60% of the
treatment group took the drug. The data was analyzed both by intention to treat and by
average treatment effect. Lab results indicate that the effect of the drug on heart disease is
independent of sex and race.
a) Intention to treat analysis: the treatment/placebo groups are defined by the
randomization, regardless of whether the patients actually took the drug. The
proportion with hypertension was 0.2 in the placebo group and 0.14 in the group
randomized to treatment. This gives a risk difference of 0.06 and a relative risk of 0.7.
Can you generalize this result to other populations, and what method of generalization
are you using?
b) Average treatment effect analysis: the treatment group consists of patients who
actually took the drug, the placebo group on those who did not. The proportion with
hypertension was 0.2 in the untreated group and 0.1 in the group actually treated. This
gives a risk difference of 0.1 and a relative risk of 0.5. Can you generalize this result
to other populations, and what method of generalization are you using?
c) The average treatment effect may suffer from some confounding. How can this be the
case in randomized trial?
(Although beside the point here: notice that the RDintention to treat=Compliance*RDtreatment as it
should be)
Day 2
7.
Frequency measures
You want to estimate the risk of developing diabetes type 1 among children. You recruit a
(very very) small cohort of 10 year old children and follow them for 4 years. The graph
summarizes the follow up of the six children in the cohort, x = disease, + = death, line =
follow up.
10
11
12
Age
13
14
a) Based on this cohort:
What is the prevalence of diabetes at age 14?
What is the 4-year risk of developing diabetes for 10 year old children?
b) Assume that the disease studied is not diabetes but otitis media (ear infection):
What is the prevalence of otitis media at age 14?
What is the 4-year risk of developing otitis media for 10 year old children?
The youth health study
The youth health study included 19200 individuals from the 10th grade in public and private
schools in 6 counties during 2000-2004. They were given a questionnaire on health and
lifestyle. 15 948 students answered the questions about sexual debut: Have you ever had
sexual intercourse? If so, how old were you the first time? The graph below shows the failure
functions from a survival analysis on this data.
c) Based on the graph, what is the approximate risk of having had a debut by the age of 16?
d) In epidemiological terms, what type of measure is this risk (prevalence, incidence
proportion or incidence rate)?
e) What type of design is the youth health study (cross sectional, cohort or case control)?
8.
Association measures
In the youth heath study (referred to above) students were asked about their current tobacco
use in the form of smoking or snuff use. The investigators were interested in the
sociodemographic patters of “pure” smokers, “pure” snuff users and combination users. Table
1 shows the results for daily smokers not using snuff (“pure” smokers) (Snuff and
combination users are not shown in this exercise).
Table 1, Daily smoking among students not using snuff
Variable
N
%
All
Sex
Boy
Girl
Parents marital status
Living together
Single
Educational plans
Accademic
Secondary 3 years
Secondary 1 year
Vocational
Family economy
Well off
Good
Short of mony
10785
14.5
5045
5740
8.7
19.5
p-value
Adjusted
Odds Confidence
Ratio interval
Adjusted
Risk Confidence
Ratio interval
<.001
1.0
3.0
(2.7 - 3.4)
1.0
2.4
(2.2 - 2.7)
1.0
2.3
(2.1 - 2.6)
1.0
1.9
(1.8 - 2.1)
1.0
1.7
2.4
2.8
(1.3 - 2.2)
(1.9 - 3.1)
(2.5 - 3.2)
1.0
1.5
2.0
2.3
(1.3 - 1.9)
(1.7 - 2.5)
(2.0 - 2.5)
1.0
1.3
1.7
(1.0 - 1.6)
(1.3 - 2.2)
1.0
1.2
1.5
(1.0 - 1.4)
(1.2 - 1.7)
<.001
7165
3564
10.4
22.5
<.001
5157
540
41
2701
9.9
15.4
22.0
23.4
936
9381
331
14.2
14.0
27.8
<.001
a) What types of frequency measures are used? What types of association measures are used?
b) Calculate the odds of smoking for boys and girls. Calculate the crude (unadjusted) OR and
crude RR of daily smoking for girls versus boys.
c) Looking again at the effect of girls versus boys you notice that the (adjusted) odds ratio and
relative risk from the models are different, 3.0 versus 2.4. You discuss this with your
colleagues and get four different suggestions: 1) This is caused by confounding, probably
from a difference in educational plans for girls and boys. 2) This is caused by the high
prevalence of smoking, particularly among girls. 3) The difference is not important, it is only
marginally significant since the confidence intervals are touching (they are in fact just
overlapping). 4) This is caused by global climate change. What is your opinion?
d) (Probably difficult!) I have a son in 10th grade with academic plans. I am (currently) living
with my wife and, you may assume that my family economy is good. Can you tell from the
RR-model what the probability is that my son is a smoker? I now supply the extra information
that the exponential of the constant term from the model equals 0.043, and that this the
expected prevalence for a person in the reference category for all the covariates. Can you now
answer the previous question?
9.
Attributable fractions
Data from the Norwegian Mother and Child Cohort show that obesity is a risk factor
caesarean section. Below is a table of BMI categories, and a table of the frequency of
caesarean section for each category of BMI.
a) Assuming that obesity causes caesarean section, calculate how many percent caesarean
sections would drop in this population if obese women were (miraculously) turned into
normal weight women.
b) Below are the results from a logistic regression of caesarean section on BMI and some
possible confounders. The attributable fraction of obesity is calculated from the model (using
“aflogit” in Stata). Should the population attributable fractions from a) and b) be the same?
Download