Research Methods - Afya Bora Consortium

advertisement
AFYA BORA CONSORTIUM GLOBAL HEALTH LEADERSHIP FELLOWSHIP
PROGRAM
Research Methods
Distance Learning
AFYA BORA CONSORTIUM
Research Methods Module
Module Instructors:
Brandon Guthrie, PhD
Acting Instructor, Department of Epidemiology
University of Washington
Email: bguth@uw.edu
Skype: brguth
Carey Farquhar, MD, MPH
Associate Professor, Departments of Medicine, Epidemiology, and Global Health
Director, Afya Bora Fellowship in Global Health Leadership
Email: cfarq@uw.edu
Skype: careyfarquhar
2
Table of Contents
Course Structure: ............................................................................................................................ 4
Learning Objectives: ....................................................................................................................... 4
Introduction to Epidemiologic Methods and Quantitative Research .......................................... 4
Introduction to Statistical Decision Making ............................................................................... 4
Epidemiologic Study Designs ..................................................................................................... 4
Causation, Bias, and Confounding.............................................................................................. 5
Measurement, Classification, and Misclassification ................................................................... 5
Data Management Practices in Health Research ........................................................................ 5
Interpretation of Epidemiologic Studies and Decision Making .................................................. 5
Qualitative Research Methods .................................................................................................... 5
Analyzing Qualitative Data and Public Health Applications ..................................................... 5
Course Schedule: ............................................................................................................................ 6
Appendix 1: List of Lecturers ........................................................................................................ 7
Appendix 2: Review Questions ..................................................................................................... 8
Lecture 1: Introduction to Epidemiologic Methods and Quantitative Research ......................... 8
Lecture 2: Introduction to Statistical Decision Making ............................................................ 12
Lecture 3: Epidemiologic Study Designs ................................................................................. 17
Lecture 4: Bias, Confounding, and Effect Modification........................................................... 21
Lecture 5: Measurement, Classification, and Misclassification ............................................... 24
Lecture 6: Data Management Practices in Health Research ..................................................... 28
Lecture 7: Interpretation of Epidemiologic Studies and Decision Making .............................. 31
Lecture 8: Multiple Variable Regression Models in Epidemiology ......................................... 36
Lecture 9: Qualitative Research Methods ................................................................................. 38
Lecture 10: Analyzing Qualitative Data and Public Health Applications ................................ 40
3
Course Structure:
To successfully complete the Research Methods module, you will need to watch each lecture and
complete the associated quiz in accordance with the course schedule. Each recorded lecture is
available online through the TREE Distance Learning portal
(http://www.tree4health.org/distancelearning/). You will be assigned a username and password
allowing you to log onto the portal. Once you are logged on, click on the Research Methods
Module in the Learning Modules box. In the Research Methods Module, you can monitor your
progress through the module and navigate between lectures. For each lecture, you can view the
recorded session, download the session for later viewing, and download the slides and associated
material. After viewing each lecture, students should complete the associated quize.
Course Instructors will be available via Skype bi-weekly (schedule TBD) to discuss each topic.
You are limited to 1 attempt on each quiz. If you achieve an aggregate average score of 70% or
greater on the quizzes, you will be eligible to take the final exam.
The Research Methods consists of
A. 10 one-hour lecture sessions,
B. 10 quizzes, and
C. 1 final exam.
Learning Objectives:
Introduction to Epidemiologic Methods and Quantitative Research
1. Give an example of a disease that is distributed unevenly in a population and what the
distribution might tell you about the cases of disease.
2. Define prevalence and incidence and describe the steps to measure each in a typical
epidemiologic study.
3. Answer the question: How do you compare disease risk between two groups and how do
you interpret these comparisons?
4. Summarize the principles for inferring causal relationships from epidemiologic data.
Introduction to Statistical Decision Making
1. List and describe the standard measures of location and spread.
2. Give examples of how graphical displays of data can be used to supplement formal
statistical analysis.
3. Understand the relationship of hypothesis testing and independence of data.
4. Answer the question: What is a p-value and how are they used to assess the strength of
statistical associations?
Epidemiologic Study Designs
1. Compare and contrast cohort and case control studies and provide examples of when each
study design would be appropriate and preferred.
2. Answer the question: What are the advantages and disadvantages of matching in
epidemiologic studies?
4
3. Answer the question: What are the primary strengths and weaknesses of randomized
trials?
4. Answer the question: How do ecological studies differ from the other types of
epidemiologic studies?
Causation, Bias, and Confounding
1. List the criteria that allow epidemiologists to assess causal relationships between
exposure and disease
2. Define bias in epidemiologic studies and describe the main categories of bias.
3. Describe the most common strategies to controlling for confounding.
Measurement, Classification, and Misclassification
1. Given an example of how the research question of interest will dictate how subjects are
classified in terms of exposure and disease.
2. Compare and contrast the impacts of non-differential and differential (selective)
misclassification.
3. Define and describe how to calculate sensitivity, specificity, positive predictive value,
and negative predictive value.
Data Management Practices in Health Research
1. Describe how study design will influence data management strategies.
2. Give examples of data entry techniques that minimize errors.
3. Outline quality control measures that can improve data quality.
Interpretation of Epidemiologic Studies and Decision Making
1. Understand how to interpret the various measures of test performance;
2. Explain how evidence from observational studies can be used to infer causal relations
between exposures and disease incidence;
3. Describe the criteria that should be used when deciding if a screening test should be used
to detect disease.
Qualitative Research Methods
1. Define phenomenology and grounded theory methods and provide examples of how these
methods can be used to address a public health question.
2. Provide a data collection strategy that could be used in a qualitative research study.
3. Compare and contrast quantitative a qualitative research methods.
Analyzing Qualitative Data and Public Health Applications
1. Provide strategies for managing qualitative data.
2. Define coding and differentiate between types of codes.
3. Illustrate how qualitative data is presented in a paper.
5
Course Schedule:
2
Lecture
Quiz Due Date
Introduction to Epidemiologic Methods and November 9th
Quantitative Research
Introduction to Statistical Decision Making November 9th
3
Epidemiologic Study Designs
November 16th
4
Causation, Bias, and Confounding
November 23rd
5
Measurement, Classification, and
November 23rd
Misclassification
Data Management Practices in Health
November 30th
Research
Interpretation of Epidemiologic Studies and November 30th
Decision Making
1
6
7
8
Multiple variable regression models in
epidemiology
December 7th
9
Qualitative Research Methods
December 14th
10
Analyzing Qualitative Data and Public
Health Applications
FINAL EXAM DUE
December 14th
*Skype Session
November 13th
November 27th
December 11th
December 18th
December 21st
*Course instructors will be available via Skype bi-weekly on Tuesday @ 9:00AM PST /
8:00PM EAT
6
Appendix 1: List of Lecturers
Carey Farquhar, MD, MPH
Associate Professor
Departments of Medicine, Epidemiology,
and Global Health
cfarq@uw.edu
Lecture 1
Barbra Richardson, PhD
Research Professor
University of Washington,
Department of Biostatistics
barbrar@uw.edu
Lecture 2
Lisa Manhart, PhD
Associate Professor
University of Washington
Departments of Epidemiology and Global
Health
lmanhart@uw.edu
Lecture 3
Victoria Holt, PhD
Professor
University of Washington
Department of Epidemiology
vholt@uw.edu
Lecture 4
Brandon Guthrie, PhD
Acting Instructor
University of Washington
Department of Global Health
brguh@uw.edu
Lecture 5 and 6
Noel Weiss, MD, DrPH
Professor
University of Washington
Department of Epidemiology
nweiss@uw.edu
Lecture 7
Romel Mackelprang, PhD
Senior Fellow
University of Washington
Department of Global Health
romelm@uw.edu
Lecture 8
Michele Andrasik, PhD
Acting Assistant Professor
University of Washington
Department of Psychiatry and Behavioral
Science
mandrasik@fhcrc.org
Lecture 9
Kate Murray, MPH
University of Washington, Center for
AIDS Research
krmurray@u.washington.edu
Lecture 10
7
Appendix 2: Review Questions
INSTRUCTIONS: Review questions are provided for each of the 10 lectures included in
this module. The relevant questions should be presented after each lecture in quiz format.
The Participants should answer at least 70% of questions correctly to successfully complete
this module. The instructor can choose to use a subset of the questions for quizzes and use
the remaining questions for a final exam at the end of the module.
Before presenting quiz and exam questions, the instructor should remove the answers and
explanations. Discussion sessions can be organized to discuss the review questions after all
participants have taken the quiz.
Lecture 1: Introduction to Epidemiologic Methods and Quantitative Research
1) A fellow researcher wants to compare the incidence of death from TB among HIV-1infected women between Nairobi and Kisumu. She finds that 2,324 women with HIV
died from TB in Nairobi in 2009 and 927 women with HIV died in Kisumu in 2009. In
order to compare the incidence of death between the two cities, which additional
denominator values does she need to collect?
A. The total populations of Nairobi and Kisumu in 2009
B. The number of women infected with Mycobacterium tuberculosis who were living in
Nairobi and Kisumu in 2009
C. The number of women co-infected HIV and M. tuberculosis who were living in
Nairobi and Kisumu in 2009
D. The number of HIV-1 infected women who were living in Nairobi and Kisumu in
2009
ANSWER: D
EXPLANATION: The value of interest here is the “incidence of death from TB among
HIV-1-infected women.” Therefore, the numerator is the number of deaths attributable to
TB among women infected with HIV during 2009, and the denominator is the total
number of HIV-1-infected women living in in the two cities during 2009.
2) A recent study published in the Journal of the American Medical Association found that
approximately 1 in 4 American women (age 14 to 59 years) are infected with HPV. This
estimate is an example of:
A.
B.
C.
D.
Incidence rate
Incidence number
Prevalence
Proportionate mortality
ANSWER: C
EXPLANATION: This is a measure of prevalence because the researchers assessed
current, rather than new, infections. If the researchers had started with a group of
8
women without HPV and monitored them over time for acquisition of HPV, then they
would have been measuring incidence.
3) Studies demonstrate that cigarette smoking increases the risk of heart disease. In a large
study, Dr. Cardio found that the annual incidence of heart disease was 32 per 1,000
among those with 20 pack-years of smoking at baseline (i.e., heavy smoking) and 10 per
1,000 among those who never smoked.
Based on this information, calculate the relative risk of heart disease due to heavy
smoking.
A.
B.
C.
D.
32 /10 = 3.2
32 – 10 = 22
32 / (32+10) = 0.76
Cannot be determined with the given information
ANSWER: A
EXPLANATION: The relative risk is calculated by dividing the incidence in the exposed
by the incidence in the unexposed. In this example, the incidence in the exposed (heavy
smokers) is 32 per 1000 and the incidence in the unexposed (never smokers) is 10 per
1000. Thus, RR = (32/1000) / (10/1000) = 32 / 10 = 3.2
4) The following table shows the number of new cases of whooping cough (Pertussis) by
age groups, 2005.
Age
Group
0-5
6-9
10 - 14
15 - 24
25 - 54
55 +
Total
Mid-year
Population
1,643
1,427
1,019
783
3,570
1,836
10,278
Number of New
Cases
231
195
460
965
452
101
2,404
The annual incidence for the 10-14 age group was:
A.
B.
C.
D.
E.
(3,570/10,278)*100 = 34.7 per 100.
(460/10,278)*100 = 4.4 per 100.
(460/1019)*100 = 45.1 per 100.
(460/3,570)*100 = 12.7 per 100.
(2,404/10,278)*100 = 23.4 per 100.
ANSWER: C
EXPLANATION: In this example, we will use the mid-year population in 2005 as our
best estimate of the number of people “at risk” of acquiring whooping cough, from which
the incident cases arose. We are looking for the incidence among those 10-14 years of
9
age, and therefore we will restrict the number of new cases (460) and the number “at
risk” (1,019) to this age group. Therefore, the incidence is 460/1019 * 100 = 45.1 per
100
5) The finding that the risk of cervical cancer increases with the number of lifetime sex
partners contributed to the understanding that cervical cancer is a sexually transmitted
infection. This finding contributes to causal inference because it best demonstrates:
A.
B.
C.
D.
Consistency (replication of findings)
Biological gradient (dose response.)
Strength of association.
Temporal Order
ANSWER: B
EXPLANATION: One criteria that we use to draw causal inference about a exposuredisease relationship is the observation that higher levels of exposure are associated with
a higher likelihood of disease. While the presence or absence of a clear dose-response
relationship is not definitive, it is an important component of causal inference. In this
example, we observe that the number of lifetime sex partners is associated with the
likelihood that a woman develops cervical cancer. The number of lifetime partners is
associated with the risk of acquiring a sexually transmitted disease. Therefore, this
observed relationship provides support for the hypothesis that cervical cancer is caused
by a sexually transmitted infection.
6) You are a clinician treating HIV patients. You are planning for the next year, and you are
trying to decide how many clinical staff members you will need to work with patients as
they start ART. This can be a time-consuming process, and you need to know how to
plan your resources.
To answer this question, which of the following would you most want to know?
A. The prevalence of HIV patients in your clinic who are on ART.
B. The incidence rate of HIV patients in your clinic starting ART.
ANSWER: B
EXPLANATION: In this example, you are attempting to plan for the number of people
who will be starting ART. Therefore you are interested in the incidence of starting ART.
7) Currently, patients with HIV who have a CD4 count <250 are recommended to start
antiretroviral therapy (ART). You are on a Ministry of Health committee that is deciding
if the CD4 criteria for starting ART should change from 250 to 350.
As part of your decision-making process, you want to know how many people would be
affected by this change. To answer this question, which of the following would you most
want to know?
10
A. The prevalence of HIV patients with a CD4 count between 250 and 350.
B. The incidence rate of HIV patients dropping below a CD4 count of 350.
ANSWER: A
EXPLANATION: The prevalence tells you how many people currently have a CD4 count
between 250 and 350. These are the people that would be affected by the change in
guidelines. The incidence would tell you the rate at which people drop below 350, but
would not tell you how many people would be affected.
8) A study has just been completed in which the researchers investigated if HIV disease
progression could be improved by providing patients with bed nets to reduce malaria. A
total of 1,000 patients with HIV and a CD4 count between 350 and 450 were recruited,
500 of whom were randomly assigned to receive an insecticide treated bed net.
After 3 years of follow-up with perfect retention, 173 of the 500 patients who received a
bed net had progressed to AIDS while 221 of the 500 patients who did not receive a bed
net had progressed to AIDS. What is the relative risk of progression to AIDS associated
with using a bed net?
A.
B.
C.
D.
E.
(173*279) / (327*221) = 0.68
(173/500) / (221/500) = 0.78
(173/500) - (221/500) =-0.096
(327/500) / (279/500) = 1.17
(173/500) / (279/500) = 0.62
ANSWER: B
EXPLANATION: A 2x2 table can be constructed to summarize the data. In this example,
the outcome, or disease, is progression to AIDS. The exposure is receiving an insecticide
treated bed net.
Disease
Exposure +
+
a
B
173 327
c
D
221 279
Total
a+b
500
c+d
500
a
173
(a + b) 500
RR =
=
= 0.78
c
221
(c + d)
500
11
Lecture 2: Introduction to Statistical Decision Making
1) A box plot allows you to look at which of the following?
A.
B.
C.
D.
Sample median
Sample spread
Potential outliers
All of the above
ANSWER: D
EXPLANATION: A boxplot is a succinct way of presenting continuous data. It shows
both the “central tendency” of the data with the median, as well as the spread with the
25th and 75th percentiles and the upper and lower whiskers. The investigator can also
determine if the data are skewed and if there are any extreme outliers. The figure below
details the information provided in a box plot.
Outliers
Largest value ≤ Q3 + (1.5 * IQR)
th
Q3: 75 percentile
Q2: median
th
Q1: 25 percentile
Smallest value ≥ Q1 – (1.5 * IQR )
Outliers
12
2) Based ONLY on the figure below, what do you conclude about the relationship between
CD4 count and viral load?
7
6
Log Viral Load
5
4
3
2
1
0
200
400
600
800
1000
1200
1400
CD4 Count
A.
B.
C.
D.
There is no relationship between CD4 count and viral load
Increases in CD4 count are associated with increases in viral load
Increases in CD4 count are associated with decreases in viral load
There is a causal relationship between CD4 count and viral load
ANSWER: C
EXPLANATION: There is an overall trend in the data from the upper left to the lower
right. When we inspect the axes of the plot, we conclude that the upper left represents
patients with low CD4 counts and high viral loads, while the lower right represents
patients with high CD4 counts and low viral loads. Based only on this figure, we cannot
assess the causal relationship between viral load and CD4 count, or even if this is a
statistically significant association, but from this visual inspection, we get a sense of the
relationship between these two variables, providing a starting point for further
investigation.
3) After conducting a study investigating the potential relationship between daily Septrin
use and HIV disease progression, you find a relative risk (RR) of 0.82 with a p-value of
0.11. What can you conclude about the relationship between Septrin use and disease
progression?
A. There is no relationship between daily Septrin use and disease progression
B. Daily Septrin use slows disease progression
C. There is insufficient evidence to reject the null hypothesis that there is no relationship
between daily Septrin use and disease progression
D. The study was underpowered to detect a true relationship between daily Septrin use
and disease progression
ANSWER: C
13
EXPLANATION: The point estimate for the relative risk for the relationship between
Septrin use and disease progression is 0.81, which indicates those on Septrin are less
likely to experience disease progression; however, the p-value for this relative risk is
0.11. Therefore, we cannot reject the null hypothesis that there is no association (i.e., RR
= 1). There are two possible explanations for this finding: (1) the true relative risk is less
than 1, but there was insufficient power to show a statistically significant difference, or
(2) the true relative risk is 1, and we observe the value of 0.81 only by random chance.
Based on the information provided, we cannot determine if there was adequate power to
detect a relative risk of 0.81
4) You hypothesize that the viral load in population A is higher than the viral load in
population B. Which measure should you use to summarize this difference?
A.
B.
C.
D.
Mean
Variance
Range
Power
ANSWER: A
EXPLANATION: In this example, we are interested in a measure of location. Of the
options available, only the mean in a measure of location. Both the variance and the
range are measures of spread. power is calculated when planning a study to determine
the probability of finding a significant difference, assuming a true difference of a given
magnitude.
14
5) You know that drug A and drug B have the same mean effect on viral load, but you
hypothesize that there is more variability in the effect of drug A compared to drug B. You
analyze the results from a randomized trial in which 500 patients received drug A and
200 patients received drug B. Which of the following measures should you use to
investigate if your hypothesis might be true?
A.
B.
C.
D.
Median
Variance
Minimum
Mode
Change in viral
load
ANSWER: B
EXPLANATION: We are interested in the variability of the effect of drugs A and B. We
may suspect that while overall the drugs have the same effect, but that consistency of the
effect differs between the two drugs. In the hypothetical figure below, both drugs have the
same median effect, but the variability of Drug A is less than Drug B, indicated by the
smaller interquartile range. Among the measure provided as options, only the variance is
a measure of spread/variability.
Drug
A
Drug
B
15
6) Which group has the higher median CD4 count?
CD4 count
300
200
100
0
Group A
Group B
ANSWER: A
EXPLANATION: The line passing through the middle of each box represents the median
of the distributions for the two groups. Thus, the median CD4 count for Group A is
approximately 220 cells/μL and the median for group B is approximately 140 cells/μL.
Therefore, the median CD4 count is higher for group A.
16
Lecture 3: Epidemiologic Study Designs
1) You are interested in investigating if HSV-2 infection is associated with acquisition of
HIV in women. To address this question you design an observational prospective cohort
study. Which of the following describes how you would carry out this study?
A. Recruit a group of women with HIV and a group of women without HIV and test the
women in each group for HSV-2 infection. Compare the proportion of HIV-infected
women who have HSV-2 with the proportion of HIV-uninfected women with HSV-2.
B. Identify women without HIV and separate the women into those who are infected
with HSV-2 and those without HSV-2. Then follow the women in each group for 2
years, testing them each month for HIV infection. Compare the rate of HIV infection
in the HSV-2 infected and uninfected groups.
C. Identify a group of women without HIV who are all infected with HSV-2. Randomize
half of the women to receive Acyclovir (a drug that suppresses HSV-2) and the other
half to receive a placebo (no active drug). Follow the women for 2 years and compare
the rate of HIV acquisition between the women on Acyclovir and those on placebo.
D. None of the above describe a prospective cohort study.
ANSWER: B
EXPLANATION: You are designing a prospective cohort study. Prospective means that
the follow-up of participants will occur after initiation of the study. A cohort study is an
observational study design where you start by identifying participants with and without
the exposure of interest and then follow them up for the outcome. In the correct answer
above, you will first identify women with and without HSV-2 infection (i.e., the exposure
of interest) and then follow them for the acquisition of HIV (i.e., the outcome of interest).
2) Which of the following is NOT TRUE regarding reasons to choose a randomized study
design?
A.
B.
C.
D.
Randomization minimizes confounding
Causal inference is easier in randomized studies
It is always possible to randomly assign exposure
Randomized studies are generally easier to analyze and interpret
ANSWER: C
EXPLANATION: One of the primary reasons to use a randomized trial design is to gain
maximum control of confounding by randomly assigning participants to the exposure
groups. Therefore, causal inference can be drawn from randomized trials because there
should be no confounding factors obscuring the exposure-disease relationship.
Unfortunately, it is not possible randomly assign all exposure. It is unethical to assign
participants to receive an exposure that is known to cause harm (e.g., smoking). It is also
impractical to investigate some exposure-disease relationships using a randomized trial
because the time between exposure and disease onset is long, or because the frequency of
disease, even among those exposed, is very low. Because of the control of confounding
through randomization, randomized trails are generally easier to analyze and interpret.
17
3) Multiple observational studies have shown evidence that male circumcision can reduce
the risk of acquiring HIV. Which of the following is a reason why randomized trials were
necessary before recommending circumcision as an HIV prevention intervention?
A. In this example, observational studies may not fully account for confounding and may
not accurately reflect the true relationship between male circumcision and HIV
acquisition. Randomized trials were needed to control confounding.
B. Observational studies should only be used for exploratory studies and should not be
used to guide public health practice.
C. The observational trials did not allow for enough time between circumcision and HIV
infection.
D. Randomized trials were not necessary. The observation studies established the causal
association.
ANSWER: A
EXPLANATION: In Africa, circumcision is highly culturally defined, such that some
cultures and religions prescribe that all males be circumcised, while other cultures or
religions never implement circumcision. Other sexual behaviors associated with higher
or lower HIV risk are also highly associated with cultural or religious membership.
Therefore, it is very difficult to fully control for confounders of the relationship between
circumcision and HIV risk. An intervention recommending that all men be circumcised to
reduce HIV risk requires a high degree of evidence supporting the effectiveness due to
the potential risk circumcision and the large scope of the intervention.
4) Which of the following is NOT an advantage of a cohort study over a case-control study?
A. It is possible to calculate incidence rates from a cohort study but not a case-control
study.
B. A cohort study is more efficient than a case-control study for investigating rare
diseases.
C. A cohort study is more efficient than a case-control study for investigating rare
exposures.
D. A cohort study can be used to investigate more than one outcome (disease) while a
case-control study can only investigate one pre-specified outcome.
ANSWER: B
EXPLANATION: Cohort studies begin with a group of participants with a given exposure
and a group without that exposure. Both groups are followed up for the outcome(s) of
interest. Because the distribution of the outcome is not manipulated, it is possible to
calculate the incidence of disease in both groups and to directly calculate the relative
risk. Cohort studies are efficient for studying rare exposures because the researcher can
specify the number of exposed and unexposed participants, but this design is inefficient to
investigate rare outcomes because it would require a very large sample size to achieve a
sufficient number of outcomes to reach statistical significance.
18
5) An outbreak of cholera has occurred in a village of 312 people. Investigators find that
residents of the village get their water from one of three sources. The investigators want
to determine which of the water sources are contaminated. They identify every resident of
the village and test them for infection with Vibrio cholera (the causal agent of cholera)
and determine where each person gets their water. They then calculate the proportion of
people who are infected with Vibrio cholera, and compare the proportions infected from
each water source. What type of study design is described here?
A.
B.
C.
D.
Cohort study
Case-control study
Cross-sectional study
Ecological study
ANSWER: C
EXPLANATION: This is best described as a cross-sectional study. Exposure and disease
were measured at the same time, without consideration of the timing of exposure relative
to disease. Cross-sectional studies are often a first step in epidemiologic investigations
because they are generally easier and less expensive to conduct than other study designs;
however, cross-sectional studies are limited by the challenge of establishing the temporal
sequence. Additionally, in a cross-sectional study, factors associated with the disease
may be related to the risk of developing the disease, or to the duration of disease. Thus,
the interpretation of cross-sectional studies should be done with caution.
6) You are studying the relationship between exclusive breastfeeding and gastrointestinal
infection among HIV-uninfected infants born to infected mothers. You decide to recruit a
group of women who have chosen to breastfeed exclusively and a group of women who
have chosen to formula feed. You ask the women to record the number of diarrheal
episodes their infants have over a 6 month period and compare the number of episodes
experienced by infants in the two groups. What type of study is this?
A.
B.
C.
D.
Cohort study
Case-control study
Randomized trial
Ecological study
ANSWER: A
EXPLANATION: The study subjects were selected based on their exposure status, which
was chosen by the subjects, and followed up prospectively. Therefore, this is a
prospective cohort study.
19
7) You are concerned that a common anti-malarial medication given to children may
increase the risk of developing childhood leukemia. You know that leukemia is a rare, but
serious disease. What would be the best study design to test your hypothesis?
A. Ecological study
B. Case-control study
C. Randomized trial
D. Cohort Study
ANSWER: B
EXPLANATION: A case-control study is usually the best option when investigating rare
diseases. Using a case control design, the investigator controls the number of diseased
and non-diseased subjects in the study. Therefore, the investigator can include as many
cases as he or she can identify. An additional advantage of this approach is that it is
possible to investigate multiple exposures in the study. If a cohort study or randomized
trial were conducted, it would be necessary to enroll a very large number of subjects to
ensure that there are an adequate number of disease outcomes to draw a conclusion
about the exposure-disease relationship. Such an approach is very inefficient because the
vast majority of subjects will not develop disease.
20
Lecture 4: Bias, Confounding, and Effect Modification
1) Which of the following is TRUE about an exposure that is causally associated with a
disease?
A.
B.
C.
D.
The exposure must cause disease in all people that are exposed.
All people with the disease must have been exposed.
The exposure must precede the onset of disease.
The exposure must be common.
ANSWER: C
EXPLANATION: For an exposure to be causally related to a “disease” outcome, the
exposure must always precede the outcome. An exposure need not cause disease in all
those exposed. For example, many people smoke but not all develop lung cancer. This
does not affect our conclusion that smoking is causally related to lung cancer. Similarly,
not all cases of disease need to have been caused by the exposure. Using the same
example, while many cases of lung cancer are due to smoking, lung cancer occurs in
non-smokers due to other causes. Finally, while rare exposures are more difficult to
investigate, particularly using a case-control design, rare exposures can be causally
related to an outcome.
2) A clinician involved in the management of patients with HIV observed that, in a 1-year
period, 10% of patients on antiretroviral therapy (ART) died compared to 6% of patients
not on ART. She is concerned that ART might be causing deaths rather than preventing
disease progression. This conclusion:
A. Is correct.
B. May be incorrect because patients starting ART may be much sicker than patients not
on ART and therefore at greater risk of dying despite being on ART.
C. May be incorrect because there is no comparison group.
D. May be incorrect because incidence rates should have been calculated instead of the
proportions that were calculated.
ANSWER: B
EXPLANATION: The observation of an association between an exposure and disease
does not mean that a causal association exists. The scenario described here is likely an
example of what epidemiologists call confounding. Confounding is the mixing of the
effect of two factors on an outcome of interest. In this case, a causal association exists
between a patient’s disease status (i.e., how sick they are) and their likelihood of dying.
Unfortunately, a patient’s disease status is also related to their likelihood of starting
ART. Thus, the sickest patients are most likely to start ART. Methods are available to
overcome, at least partially, the effect of confounding, but epidemiologists must always
consider the potential that a confounding factor may account for the observed exposuredisease relationship.
21
3) You are conducting a case-control study to determine if taking Septrin reduces AIDS
related mortality. You plan to include as cases 100 people who have died from AIDS
related causes and as controls 100 people currently living with HIV. You will ask the
controls about Septrin use in the prior 6 months and ask the next-of-kin of the cases about
Septrin use by the cases in the 6 months prior to their death. What can be said about
exposure ascertainment in this study?
A. There are no foreseeable issues of bias in this study design.
B. Bias may occur. Controls may more accurately recall their Septrin use compared to
the next-of-kin of the cases, leading to differential misclassification of exposure.
C. A better strategy would be to ask the next-of-kin of both cases and controls about the
Septrin use of the study subject.
D. Both B and C.
ANSWER: D
EXPLANATION: Exposure status has been measured differently for cases and controls.
This can lead to bias because controls are more likely than the next-of-kin of the cases to
correctly report their Septrin use. In order to reduce bias in an epidemiologic study, it is
important to ensure that both exposure and outcome are assessed in the same manner
and at the same level of accuracy for all subjects. Unfortunately, this means that we must
sometimes use an inferior method of measurement (e.g., asking the next-of-kin about
exposure status) for all subjects even when there are better methods that can only be used
with a subset of subjects (e.g., only controls).
4) Which of the following is NOT TRUE about bias?
A. Bias only occurs when there is an over estimate of the association between exposure
and disease.
B. Bias occurs when the observed association in an epidemiologic study differs from the
true association.
C. Bias can occur when study subjects in a prospective study are lost to follow-up.
D. Differences in how exposure status is ascertained for cases and controls can give rise
to bias.
ANSWER: A
EXPLANATION: Bias occurs when the observed exposure-disease relationship is
different than the true association. Bias may result in a stronger or weaker observed
relationship. Bias can arise due to many causes and can be present in all study designs
(e.g., selection bias, confounding, ascertainment bias, indication bias, loss-to-follow-up).
Certain study designs are more prone to bias than others, but researchers must always be
alert to potential sources of bias in their study.
22
5) Which of the following conditions are necessary for confounding to occur?
1. Factor is associated with the disease of interest
2. Factor is a result of the disease
3. Factor is associated with the exposure of interest
4. Factor is not in the causal pathway of interest between exposure and disease
A.
B.
C.
D.
1 only
1, 2, and 3
1, 3, and 4
2 and 3
ANSWER: C
EXPLANATION: In order to be a confounder, a factor must be associated with both
disease and exposure, and the factor cannot be in the causal pathway between exposure
and disease that you are interested in investigating.
6) The figure below shows cases of Guillain-Barre syndrome in relation to the time since
influenza vaccination. What evidence does this provide in support of a causal association
between this vaccine and Guillain-Barre syndrome.
A.
B.
C.
D.
No alternative explanations exist
Association is strong
Association is strongest when predicted to be so
Observed evidence is consistent
ANSWER: C
EXPLANATION: The peak in the number of cases occurs in a window soon after
vaccination and drops back to baseline soon after. Thus we see the strongest effect in the
period we would expect.
23
Lecture 5: Measurement, Classification, and Misclassification
1) You are interested in investigating if using a mobile phone while driving increases the
risk of being involved in a car accident. You choose to conduct a case-control study
where you will enroll 200 people who have had a car accident in the past week as cases
and 200 people who have not had an accident in the past week as controls. Which of the
following would be the best strategy for assessing mobile phone usage by cases in
relation to accident risk?
A. Ask cases if they ever use a mobile phone while driving.
B. Ask cases if they used a mobile phone while driving at any time during the week
when they had their accident.
C. Ask cases if they used a mobile phone while driving on the same trip that they had
their accident, prior to the accident itself.
D. Ask cases if they used a mobile phone while driving during at least half of their trips
during the past week.
ANSWER: C
EXPLANATION: The objective is to measure exposure (mobile phone usage) during the
etiologically relevant time period. Based on our hypothesis of the relationship between
mobile phone usage and automobile accidents, we believe that mobile phone usage while
driving causes distraction that results in inattention to road conditions, which in turn
increases the risk of causing an accident or being unable able to avoid hazardous
situations. Therefore, the etiologically relevant time period would be the time
immediately before the accident occurs. Because it may be difficult for a subject to
remember the exact timing of mobile phone usage, we choose to ask about mobile phone
usage on the same trip as the accident, excluding any usage after the accident. More
general assessment of mobile phone usage (e.g., ever using a mobile phone while driving)
do not assess the etiologically relevant time period, and would likely result in
mismeasurement.
2) What is the effect of non-differential misclassification of exposure in a cohort study?
A. The observed relative risk will be closer to the null (RR=1.0) than the true relative
risk.
B. The observed relative risk will be greater than the true relative risk.
C. The observed relative risk will be less than the true relative risk.
D. It is not possible to predict the direction of bias due to non-differential
misclassification.
ANSWER: A
EXPLANATION: Non-differential misclassification occurs when the likelihood that
exposure is misclassified does not depend on the probability that a subject will develop
disease. Non-differential misclassification of exposure results in bias, but the bias occurs
in a predictable direction: the observed measure of excess risk (e.g., relative risk or odds
ratio) is closer to the “null” (i.e., RR or OR equal to 1). As a result, when non24
differential misclassification of exposure is present, we can conclude that the true
measure of excess risk is greater than the observed estimate (i.e., further from 1).
3) Which of the following is the definition of sensitivity?
A. The probability that the PATIENT DOES NOT HAVE DISEASE, given that the
TEST IS NEGATIVE.
B. The probability that the PATIENT HAS DISEASE, given that the TEST IS
POSITIVE.
C. The probability of TESTING NEGATIVE, given that the PATIENT DOES NOT
HAVE DISEASE.
D. The probability of TESTING POSITIVE, given that the PATIENT HAS DISEASE.
ANSWER: D
EXPLANATION: Sensitivity is used to measure how successful a test is at identifying
“disease” when it is present. It is expressed as a probability. For example, a sensitivity of
0.90 means that 90% of those with disease will have a positive test.
4) From the table below, what is the specificity of the test?
Test Results
A.
B.
C.
D.
positive
negative
True Disease Status
diseased non-diseased
135
37
25
163
[135/(135+65)] * 100 = 67.5%
[135/(135+37)] * 100 = 78.5%
[163/(163+37)] * 100 = 81.5%
[163/(163+25)] * 100 = 86.7%
ANSWER: C
EXPLANATION: Specificity is the probability of testing negative given that the patient
doesn’t have disease. It is calculated by dividing the number of true negatives by the
number of all patients without disease (i.e., true negatives plus false positives).
25
5) A recent clinical trial found that an antiretroviral-based vaginal microbicidal gel reduces
the risk of a woman acquiring HIV from an infected partner. The investigators found that
a large proportion of women in both the experimental and placebo groups did not use the
gel every time they had sex, and that there was no difference in adherence between the
two groups. The observed relative risk associated with using the microbicidal gel was
0.61. Which of the following is a possible value for the true relative risk if there had been
perfect adherence?
A.
B.
C.
D.
RR = 0.40
RR = 0.90
RR = 1.00
RR = 1.25
ANSWER: A
EXPLANATION: This is an example of non-differential misclassification of exposure.
Therefore, the observed relative risk is biased toward the null (RR=1). Because the
observed relative risk is less than 1, this means that the true relative risk is even smaller.
The only potential value that fits this scenario is a relative risk of 0.40.
6) Which of the following is the definition of specificity?
A. The probability that the PATIENT DOES NOT HAVE DISEASE, given that the
TEST IS NEGATIVE
B. The probability that the PATIENT HAS DISEASE, given that the TEST IS
POSITIVE
C. The probability of TESTING NEGATIVE, given that the PATIENT DOES NOT
HAVE DISEASE
D. The probability of TESTING POSITIVE, given that the PATIENT HAS DISEASE
ANSWER: C
EXPLANATION: Specificity is the probability of testing negative given that the patient
doesn’t have disease. It is calculated by dividing the number of true negatives by the
number of all patients without disease (i.e., true negatives plus false positives).
26
7) What is the epidemiologically relevant time period when measuring exposures in relation
to a disease outcome?
A. Any time prior to the onset of disease
B. The time period during which an exposure is likely to be causally related to the
disease outcome
C. The time period during which ALL exposures result in disease onset
D. The time period during which exposures are likely to be the result of disease onset
ANSWER: B
EXPLANATION: The epidemiologically relevant time period is based on the
hypothesized mechanism by which the exposure is thought to result in the onset of
disease. For exposures with a long induction period, relevant exposures must occur well
before the onset of disease (the length of this period is based on the exposure-disease
mechanism). Exposures occurring immediately before disease onset would not be in the
epidemiologically relevant time period. Conversely, exposures with a short induction
period (e.g., mobile phone usage and automobile accidents), the relevant time period is
shortly before the accident. Regardless of the induction period, the epidemiologically
relevant time period never includes time after the onset of disease.
27
Lecture 6: Data Management Practices in Health Research
1) Which of the following are considered data collection instruments?
1. CD4 count machine
2. Interviewer-administered questionnaire
3. Medical record abstraction form
4. Database for storage of study data
A.
B.
C.
D.
1 and 2 only
2 and 3 only
1, 2, and 3
1, 2, 3, and 4
ANSWER: C
EXPLANATION: A data collection instrument is a general term for any method of
collecting information about study subjects. It may be a medical device such as a CD4
count machine or a questionnaire to assess behavior information. A database is used to
store study data, but is not itself a means of collecting data.
2) Which of the following is FALSE about skip patterns in questionnaires?
1. Skip patterns should only be used in interviewer-administered questionnaires and not
in self-administered questionnaires.
2. Skip patterns should be tested in all possible combinations to ensure the pattern works
under all conditions.
3. Skip patterns should be used to guide the user through a complicated series of
questions.
A.
B.
C.
D.
1
2
3
Both 2 and 3
ANSWER: A
EXPLANATION: Skip patterns are a useful technique for guiding users through a
complicated set of questions where not all questions should be answered by all subjects.
Skip patterns can be used in questionnaires that are administered by a study interviewer,
but they can also be used in self-administered questionnaires. The skip pattern can be
more complex and more sophisticated when the questionnaire is administered by a
trained interviewer. Patterns should be simpler in self-administered questionnaires to
ensure that subjects are able to follow the skip pattern without making mistakes.
28
3) For the following questionnaire item, how many variables would be needed in the
database used to store the questionnaire results?
Which of the above symptoms prompted you to seek care?
(mark all that apply)
□ fever
□ diarrhea
□ vomiting
□ cough
A. 1
B. 3
C. 4
D. 5
ANSWER: C
EXPLANATION: In “check all that apply” questions, there should be one variable for
each item in the list of potential responses.
4) Which of the following is TRUE about duplicate data entry?
A. Duplicate entry is time consuming and does little to improve quality and therefore
should not be used.
B. Duplicate entry will identify all errors in study questionnaires and therefore should be
used in all cases.
C. Duplicate data entry only works if all questionnaires for all study participants are
double entered.
D. Duplicate entry can be done on a subset of study questionnaires or on all study
questionnaires, depending on the demands of the study.
ANSWER: D
EXPLANATION: Duplicate entry is an effective method of detecting transcription errors
when data is entered into a database from some other source such as a paper-based
questionnaire. Duplicate entry can be conducted on all data entry as a means of catching
nearly all cases transcription error, but this can be a costly and time consuming
approach. Alternatively, a subset of the data entry can be conducted in duplicate to
monitor accuracy. In this situation, the research should set a maximum threshold for
errors based on the level of mismeasurement that is judge to be acceptable. In the event
that the error rate exceeds the threshold, the researcher would investigate the data entry
process and institute additional training or oversight as needed.
29
5) Which of the following is TRUE about the prevention, detection, and correction of errors
in study data?
A. Error checking is only the responsibility of the data clerk who enters the data into the
database.
B. Data entry errors can be minimized through good questionnaire design, training of
those administering the questionnaire, and error checking at the time of data entry.
C. Error checking is only the responsibility of the person collecting the data from the
study participant.
D. Errors in the data should never be corrected once they are entered into the database.
ANSWER: B
EXPLANATION: The accuracy of study data is the responsibility of all members of the
study team. There should be a clear protocol of how the data is check, monitored, and
corrected.
30
Lecture 7: Interpretation of Epidemiologic Studies and Decision Making
The following data were obtained on 100 women newly diagnosed with ovarian cancer and a
sample of 100 demographically similar women seeking care at the same location as the women
with cancer. The first three questions are based on these data.
Type of patient
Abdominal bloating in the prior
Ovarian cancer
Other
month
Yes
No
43
57
100
8
92
100
1) The sensitivity of abdominal bloating for the presence of ovarian cancer is
A.
B.
C.
D.
43/(43 + 57)
92/(8 + 92)
43/(43 + 8)
Cannot be determined from these data
ANSWER: A
EXPLANATION: Sensitivity is used to measure how successful a test is at identifying
“disease” when it is present. It is calculated by dividing the number of patients who have
the disease and test positive by the number of all patients with disease (true positives plus
false negatives).
2) The specificity of abdominal bloating for the presence of ovarian cancer is
A.
B.
C.
D.
43/(43 + 57)
92/(8 + 92)
43/(43 + 92)
Cannot be determined from these data
ANSWER: B
EXPLANATION: Specificity is the probability of testing negative given that the patient
doesn’t have disease. It is calculated by dividing the number of true negatives by the
number of all patients without disease (i.e., true negatives plus false positives).
31
3) The prevalence of occult (undiagnosed) ovarian cancer in the population from which the
200 women in this study were drawn is estimated to be 100 in 250,000. The predictive
value of abdominal bloating for the presence of ovarian cancer in this population is:
A.
B.
C.
D.
43/(43 + 57)
43/(43 + 8)
43/20,035
Cannot be determined
ANSWER: C
EXPLANATION: The positive predictive value of a test is the probability that a patient
has disease, given that they test positive. While sensitive and specificity are
characteristics of the test itself and do not depend on the prevalence of disease in the
population in which the test is used, the predictive value of a test is a function of the
sensitivity and specificity of the test and of the prevalence of disease. The positive
predictive value can be calculated as follows:
If we start with a hypothetical case in which we have 100 people with disease drawn from
the general population, then we would be drawing from a population of 250,000 people.
We then can begin constructing a new 2x2 table.
True disease status
NonTotal
Diseased
diseased
Positive
Expected test
result
Negative
Total
100
250,000
We can easily calculate the number of non-diseased people in this hypothetical
population as 250,000 - 100 = 249,900
True disease status
NonTotal
Diseased
diseased
Positive
Expected test
result
Negative
Total
100
249,900
250,000
Now, using the sensitivity of the test (43%), we can calculate the expected number of true
positives (100 x 0.43 = 43) and the number of false positives (100 - 43 = 57). So far,
these numbers match the original table because we started with a hypothetical population
with 100 disease individuals.
True disease status
NonTotal
Diseased
diseased
Positive 43
Expected test
result
Negative 57
Total
100
249,900
250,000
32
We now use the known specificity (92%) to calculate the number of expected true
negatives and false positives. The true negatives are calculated as 249,900 x 0.92 =
229,908 and the false positives are calculated as 249,900 - 229,908 = 19,992
True disease status
NonTotal
Diseased
diseased
Positive
43
19,992
Expected test
result
Negative
57
229,908
Total
100
249,900
250,000
Now that the table is fully filled, we can calculate the positive predictive value in this
population using the following formula:
𝑃𝑃𝑉 =
𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
43
43
=
=
𝑡𝑟𝑢𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 + 𝑓𝑎𝑙𝑠𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠 43 + 19,992 20,035
= 0.0021 𝑜𝑟 0.21%
As this example demonstrates, when the prevalence of disease in the population is low,
imperfect specificity, even when it is as high 92%, will result in a situation where most
positive tests are false positives. In this example, only 0.21% of those with a positive test
actually have disease.
4) Comparison of rates of illness and death across geographic populations can be a guide to
influences regarding disease etiology, but can be limited by:
A. Differences across the populations with regard to factors other than the one under
consideration.
B. The presence of only small differences in the prevalence of the characteristic of
concern across the populations being studied.
C. Both (a) + (b)
D. Neither (a) nor (b)
ANSWER: C
EXPLANATION: It is often tempting to draw conclusions about disease etiology by
comparing disease rates in different geographical regions. Such an approach is often
referred to as an ecological study because the unit of analysis is a group of people, often
defined by geography, with exposure measured at the group level, as opposed to other
study designs where the unit of analysis is the individual where the exposure is measured
for each person. For example, if you observe a high rate of lung cancer in Country A and
a low rate of lung cancer in country B, you may start looking for differences between
countries A and B in terms of frequency of exposures that may be related to disease (e.g.,
smoking). Unfortunately, there are likely many factors that differ between the two
countries (e.g., industrial exposures that increase lung cancer risk), making it difficult to
attribute the difference in disease to a specific factor of interest. Ecological comparisons
to investigate a factor of interest may also be limited when the distribution of the factor of
interest does not vary greatly between comparison populations. For example, if you are
33
interested in the relationship between lack of exercise and the risk of having a heart
attack, your ability to draw conclusions by comparing the incidence of heart attacks
between two regions will be limited if the distribution of those who do and do not exercise
is similar in the two regions.
5) Do you agree or disagree with the following assertion: “The presence of some persons
with an illness who had never sustained a given exposure means that the exposure does
not have the capacity to cause the illness in question.”
A. Agree
B. Disagree
ANSWER: B
EXPLANATION: The definition of a causal relationship used by epidemiologists does not
require that all cases of disease must be due to a common exposure. Returning once
again to the example of lung cancer, the disease can result from a number of different
exposures, including environmental factors such as air pollution, tobacco smoking, and
exposure to asbestos, but may also be due to genetic factors independent of
environmental exposure. The complex set of mechanisms of disease causation does not
mean that any one of these exposures on their own is not causally related to disease.
6) A new screening test has been developed that can detect prostate cancer in men. Which of
the following pieces of information do you need to know before deciding if the screening
test should be used to test asymptomatic men for prostate cancer?
A. The proportion of positive tests that represent true cases of prostate cancer.
B. The cost of further testing and evaluation for men with a false positive test
C. If there is a treatment available that will improve the outcome for men with prostate
cancer that test positive
D. All of the above
ANSWER: D
EXPLANATION: The decision to use a screening test should be based on the positive and
negative predictive value of the test in the target population, an understand of the full
cost of the test, including the cost of false positives, and the availability of a therapy that
changes the outcome of disease in those that test positive.
34
7) You are developing a screening test based on clinical criteria to detect patients who are
experiencing a myocardial infarction (MI) after they have presented at the hospital. This
test will be used to make decisions about how to manage patients. Those who test
positive will be evaluated further and treated promptly. Those who test negative will be
observed and discharged. What characteristics are you looking for in a good test?
A.
B.
C.
D.
High sensitivity, with lower specificity acceptable
High specificity, with lower sensitivity acceptable
Only a test with nearly perfect sensitivity and specificity
It depends on the prevalence of MI in the population
ANSWER: A
EXPLANATION: You want to detect as many true cases of MI as possible (high
sensitivity). In this example, it is better to have some false positives than it is to miss a
true case of MI.
35
Lecture 8: Multiple Variable Regression Models in Epidemiology
1) Which of the following is not a type of multivariable regression?
A. Linear regression
B. Cox proportion hazards regression
C. Logistic regression
D. All of the above are types of multivariable regression
ANSWER: D
EXPLANATION: Linear, logistic, and Cox proportional hazards regression are all
considered forms of multivariable regression. Each type of regression evaluates a
different type of outcome variable: in linear regression, the dependent, or outcome
variable is continuous. In Logistic regression, the dependent variable is binary, meaning
that it can only take one of two values (e.g., diseased or non-diseased). In Cox proportion
hazards regression, the dependent variable is the amount of time from some starting
point until an outcome of interest. In all of these forms of regression, the analyst can
include multiple variables as independent or explanatory variables in the model.
2) Which of the following is not an advantage of multivariate regression?
A. It is possible to adjust for multiple confounders at the same time
B. Regression models eliminate selection bias
C. Regression models can be used to analyze case-control and cohort studies
D. Regression models can be used to estimate measures of risk commonly used in
epidemiology
ANSWER: B
EXPLANATION: Selection bias results from limitations in the study design and cannot be
controlled by regression alone. While regression methods may help to lessen the impact
of selection bias, it best addressed by properly designing the study to minimize this form
of bias.
3) In what situation should Cox regression be used instead of logistic regression?
A. In longitudinal studies where the duration of follow-up is not equal for all study
subjects
B. To analyze a case-control study
C. When there are more unexposed study subjects than exposed subjects
D. To analyze all prospective studies
ANSWER: A
EXPLANATION: Cox regression is used to analyze time-to-event data where follow-up
time is unequal between subjects. Logistic regression can be used to analyze case-control
studies as well as cohort studies and randomized trials. However, in these latter designs
where subjects are selected based on an exposure and followed up for an outcome,
logistic regression is appropriate only when all subjects are followed for the same
amount of time and there are no subjects who are lost to follow-up.
36
4) Which of the following are TRUE about linear regression?
1. The outcome variable (y) should be a continuous variable.
2. The independent (exposure) variable (x) should be a continuous variable.
3. The independent (exposure) variable (x) can be either a continuous or categorical
variable.
4. Linear regression allows you to adjust for multiple variables at the same time.
A.
B.
C.
D.
1 and 2
1, 2, and 3
1, 3, and 4
1, 2, 3, and 4
ANSWER: C
EXPLANATION: Linear regression is used when the outcome variable (y) is continuous.
In a linear regression model, the independent variables can be continuous, discrete,
binary, or categorical. A strength of linear regression, as with other forms of regression,
is that it can be used to adjust for multiple confounding variables at the same time.
5) Which of the following are TRUE about logistic regression?
1. Logistic regression produces odds ratios (OR) for each variable included in the
model.
2. Logistic regression can be used to analyze case-control studies, cross-sectional
studies, and cohort studies with the same follow-up time for all participants.
3. Continuous variables should not be included as confounders in a logistic regression
model.
4. Logistic regression can be used to analyze data measuring the time from enrollment
until the onset of disease.
A.
B.
C.
D.
1 and 2
1, 2, and 3
1, 3, and 4
1, 2, 3, and 4
ANSWER: A
EXPLANATION: Logistic regression is used when the outcome variable (y) is binary.
While it is the primary mean of analyzing case-control studies, logistic regression can
also be used to analyze cohort studies and randomized trials, as long as follow-up is
complete and of the same duration for all subjects. In a logistic regression model, the
independent variables can be continuous, discrete, binary, or categorical. Time to event
data is best analyzed using a Cox proportional hazards model.
37
Lecture 9: Qualitative Research Methods
1) How should the appropriate sample size be selected in a qualitative research study?
A. The sample size should be based on a statistical calculation to ensure adequate power
to test the a priori hypothesis being tested.
B. Qualitative studies should always conduct 15 individual interviews and 5 focus group
discussions.
C. Appropriate sample sizes for qualitative studies shouldn’t be defined ahead of time.
The sample size should be based on the principle of saturation.
D. The decision of sample size in a qualitative study should be based on the budget.
ANSWER: C
EXPLANATION: Unlike quantitative study designs, the final sample size for a qualitative
study should not be specified ahead of time. While there are general principles that can
be used to estimate the number of interviews or focus groups that should be conducted,
these estimates should be used for planning purposes and should not dictate the final
sample size. The principle of saturation is that you should continue collecting data until
you are no longer gaining new information from additional subjects. This may result in
smaller sample size than initially anticipated if you reach saturation earlier than
expected, but it may also mean that you will require a larger sample size than anticipated
if you continue to gain new information with each additional subject.
2) Which of the following is TRUE about phenomenology?
A. The goal of phenomenology is to gather an in-depth reflective description of
experiences.
B. The phenomenology approach seeks to explain why people behave in the way that
they do.
C. Research studies using phenomenology should use predetermined questions to ensure
that the a priori hypothesis can be tested.
D. Snowball sampling is not appropriate for a study using a phenomenology approach.
ANSWER: A
EXPLANATION: Phenomenology seeks to describe rather than explain the experience
and/or behavior of subjects. Phenomenology is often used to explore a new area of
investigation, and is not driven be an a priori hypothesis.
3) Which of the following is NOT an appropriate method of data collection for a
phenomenology study?
A. Informal conversations
B. Semi-structured interviews
C. Structured questionnaires
D. Focus groups
ANSWER: C
EXPLANATION: While structured questionnaires may be used to collect basic
demographic information about subjects in qualitative study, they are not appropriate for
collecting the information that is the main subject of a qualitative. Data collection
38
strategies in qualitative research require a degree of flexibility and ability to adapt and
respond to a subject’s interaction with the investigator.
4) Which of the following is TRUE about grounded theory?
A. Participants in a grounded theory study should always be randomly selected from the
target population.
B. The process of conducting a grounded theory study is iterative. The sampling strategy
is modified as new information is collected and analyzed.
C. Focus groups are never used in grounded theory.
D. The sampling strategy and approach should not be changed after the study has started.
ANSWER: B
EXPLANATION: Grounded theory involves the development and testing of hypotheses
about the process being investigated and how subjects respond to the process. The
resulting theory is grounded in data from large numbers of participants, and is the
product of an iterative process in which the investigator analyses their findings, refines
their theories, and conducts additional investigations to test and further refine these
theories.
5) Which of the following is FALSE about focus group discussions?
A. Focus groups are best suited when interactions between participants will yield more
information relevant to the research questions.
B. The facilitator for a focus group should use open-ended questions and use a nondirective and non-judgmental approach.
C. Focus groups are not well suited when asking sensitive questions that may not be
answered truthfully in a group setting.
D. The main purpose of a focus group is to get the group to agree on a set of responses to
items on a questionnaire.
ANSWER: D
EXPLANATION: Focus groups are useful when it is desirable to observe the interaction
among interviewees. The goal should not be to seek agreement among interviewees, but
rather to explore the range of responses and how these responses relate to one another.
Focus groups are best used when investigating community attitudes or perceptions of
how the community responds to the process being investigated. Focus groups are less
useful when investigating topics that are sensitive and about which subjects may not feel
comfortable giving honest responses in a group setting. Focus group facilitators should
be trained in how to use open-ended questions and non-directive approaches to maximize
the information gained from the group and to avoid biasing the group with a priori
hypotheses about the subject being investigated.
39
Lecture 10: Analyzing Qualitative Data and Public Health Applications
1) Which of the following is NOT a level of coding for qualitative data?
A. Descriptive coding
B. Community coding
C. Analytic coding
D. Topic coding
ANSWER: B
EXPLANATION: Descriptive, analytic, and topic coding are all levels of coding used in
the analysis of qualitative data.
2) True or False: codes used in analyzing qualitative data should be developed before the
research is started and should not be changed in order to prevent bias in the analysis.
A. True
B. False
ANSWER: B
EXPLANATION: Codes should be developed as the investigator reviews the qualitative
data. Codes should be revised and developed as additional information is collected.
Multiple analysts should be involved in developing and applying codes and in analyzing
coded data.
3) Which of the following statements are TRUE about thematic codes?
1. Thematic codes are used to describe characteristics of the data itself.
2. Thematic codes are used to describe topics present in the transcript of an interview.
3. Thematic codes can be revised and modified during the process of analyzing the data.
4. Thematic codes are only used when analyzing focus group discussions.
A.
B.
C.
D.
1
1 and 2
2 and 3
3 and 4
ANSWER: C
EXPLANATION: Thematic coding, also referred to as topic coding, is the most common
type of coding. It is used to describe topics in transcripts and other forms of qualitative
data. Thematic codes are commonly revised and modified as the investigator proceeds
through the review of the data. After preliminary codes are developed, they are typically
discussed, merged, modified, and expanded. The data are then recoded using the revised
codes.
40
4) Which of the following is FALSE about analyzing qualitative data?
A. Qualitative data may include transcripts of interviews, audio recordings, and
videotaped interviews.
B. The process of analyzing qualitative data involves an adaptive process that
incorporates new information that arises as the research is conducted.
C. Analysis of qualitative data requires the use of specialized software.
D. Codes are used to describe qualitative data, develop hypotheses, and to conduct
comparisons.
ANSWER: C
EXPLANATION: A wide range of data may be used when analyzing a qualitative study.
This may include transcripts from interviews and focus groups, notes from the
observation of subjects, or audio and video recordings. Unlike most forms of quantitative
analysis, qualitative analysis is adaptive and evolves as new information is gained. Codes
are commonly used to describe and organize qualitative data. While specialized software
can be helpful in analyzing qualitative data, it is by no means necessary, and does not
guarantee that the analysis is conducted appropriately.
5) Which of the following is NOT a characteristic of a well conducted qualitative study.
A. The researcher conducted and reported a thorough review of the relevant literature.
B. The study used a rigid prior conceptual framework when analyzing and interpreting
the data.
C. The qualitative methods used were a good match with the research question.
D. The researcher compared their findings with others reported in the literature.
ANSWER: B
EXPLANATION: Well conducted qualitative studies should include a comprehensive
understanding of the relevant scholarly literature in addition to a good foundation in the
methodologies of qualitative research. The investigator should have a good sense of
which questions are appropriate to be answered through qualitative research and which
questions are better answered through other approaches. The findings from a qualitative
study should be put in context with other published research, from both qualitative and
quantitative approaches. It is important that the investigator not impose a rigid prior
conceptual framework to avoid introducing bias into the study and failing to gain new
knowledge from study subjects that may contradict previous ideas and hypotheses.
41
Download