Unit 8: Cohort Studies Unit 8 Learning Objectives: Considering the prospective cohort study: 1. Understand strengths and limitations of this study design. 2. Understand approaches to selecting an “exposed” population. 3. Understand approaches to selecting a comparison group(s). 4. Recognize primary sources of exposure and outcome information. Unit 8 Learning Objectives: Considering the prospective cohort study: 5. Recognize contributions of major studies conducted in the United States. --- Framingham Heart Study --- Nurses Health Study 6. Understand primary sources of bias. 7. Understand the purpose and methods for conducting sensitivity analyses. Unit 9 Learning Objectives: 8. Understand design features and strengths and limitations of retrospective cohort studies. 9. Differentiate between incidence risk and rate, and risk ratio and rate ratio. 10. Calculate person time for “time-dependent” exposures. 11. Understand factors that influence accurate classification of person-time exposure. 12. Understand the concept and components of the “empirical induction period.” 13. Understand the concept of “non-exposed person-time” among “exposed” subjects. Axiom: Since most epidemiologic research is “observational” by nature, epidemiologic studies typically obtain imprecise answers, but to the right health-related questions that cannot be evaluated using experimental study designs. Prospective Cohort Study Review – Prospective Cohort Study Prospective cohort (“follow-up”) study: • Disease free individuals are selected and their exposure status is ascertained. • Subjects are followed for a period of time to record and compare the incidence of disease between exposed and non-exposed individuals (e.g. risk ratio or rate ratio). Review – Prospective Cohort Study Prospective cohort (“follow-up”) study: Exposure Disease ? ? Exposure may or may not have occurred at study entry Outcome definitely has not occurred at study entry Prospective Cohort Studies (Also called “longitudinal” studies) Design Features Strengths: • Can elucidate temporal relationship between exposure and disease (hence, “strongest” observational design for establishing cause and effect). • Minimizes bias in the ascertainment of exposure (e.g. recall bias). • Particularly efficient for study of rare exposures. Design Features Strengths (cont.): • Can examine multiple effects of single exposure. • Can yield information on multiple exposures. • Allows direct measurement of incidence of disease in exposed and non-exposed groups (hence, calculation of relative risk). Design Features Limitations: • Not efficient for the study of rare diseases. • Can be very costly and time consuming. • Often requires a large sample size. • Losses to follow-up can affect validity of results. • Changes over time in diagnostic methods may lead to biased results. Design Features Selection of the Exposed Population: The exposed population should relate to the hypothesis: • For common exposures (e.g. smoking, coffee drinking) and relatively common chronic diseases, the general population/geographically-defined areas are good choices. • For rare exposures, ”special cohorts” are more desirable (e.g. particular occupations or environmental factors in specific geographic locations). Design Features Selection of the Exposed Population: • Although cohort studies are not optimal for evaluation of rare diseases, certain outcomes may be sufficiently common in ”special exposure cohorts” to yield an adequate number of cases. • To enhance validity, some exposed populations are selected for their ability to facilitate complete and accurate information (e.g. doctors, nurses, entire companies, etc.). Design Features Selection of the Comparison Group: • The groups being compared should be as similar as possible on all factors that relate to disease other than the exposure under investigation (e.g. to reduce the potential for confounding). • Ability to collect adequate information from the non-exposed group is essential. Design Features Internal Comparison Group: • Members of a single general cohort are classified into exposed and non-exposed categories. • Most often used for common exposures. • The non-exposed group becomes the comparison group. • Must be careful of other potential differences between the exposed and non-exposed groups. Design Features General Population Comparison Group: • The general population will probably include some exposed persons. • Due to the “healthy worker effect,” the general population may be expected to experience higher mortality than most occupational cohorts. • Comparisons with population rates are possible only for outcomes for which population rates are available. Design Features Special Exposure Comparison Group: • Another cohort with demographic characteristics similar to the exposed group, but considered non-exposed to the factor of interest is selected (e.g. another occupational group). Note: To enhance validity, it may be important to have multiple comparison groups. Design Features Sources of Exposure Information: • Pre-existing Records: Advantages: --- Inexpensive --- Relatively easy to work with --- Usually unbiased since the data were collected for non-study purposes Design Features Sources of Exposure Information: • Pre-existing Records: Disadvantages: --- Exposure information may not be precise enough to address the research question. --- Records frequently do not contain data on potential confounding factors. Design Features Sources of Exposure Information: • Self Report (interviews, surveys, etc.) Advantages: --- Opportunity to question subjects on as many factors as necessary. --- Good for collecting information on exposures not routinely recorded. Design Features Sources of Exposure Information: • Self Report (interviews, surveys, etc.) Disadvantages: --- Subject to response bias (e.g. due to stigma, response expectations, etc.). --- Subject to interviewer bias. --- Subjects may be sufficiently unaware of their exposure status (e.g. chemical exposure). Design Features Sources of Exposure Information: • Direct Measurement If obtained in a comparable manner, can provide objective and unbiased exposure ascertainment (e.g. blood pressure, serum samples, environmental measurements, etc.). --- Can be used on a fraction of the cohort to validate other types of exposure ascertainment. Design Features Sources of Exposure Information: • Repeated Measurements -- If frequency of exposure changes over follow-up, repeated measurements allows for revision of exposure classification. --- Periodic questioning of cohort members allows for newly identified exposures of interest to be measured. --- Good for “transient” exposures. Design Features Types of Exposure Measurements: • Dichotomous (e.g. presence of HLA • Intensity (e.g. mean blood pressure level) • Duration (e.g. weeks of chronic stress) • Cumulative (e.g. pack-years of smoking) • Regularity (e.g. frequency of episodic anger) • Variability (e.g. range of cardiovascular reactivity) type) Design Features Sources of Outcome Information: • Death certificates (National Death Index) – for some causes, notoriously unreliable • Clinical history • Self-reports • Medical examination (periodic re- examination of the cohort) • Hospital discharge logs Design Features Outcome Information: • Procedures for identifying outcomes must be equally applied to all exposed and nonexposed individuals. • Goal is to obtain complete, comparable, and unbiased information on the health experience of each study subject. • Combinations of various sources of outcome data may be necessary. Prospective Cohort Study Examples: • Framingham Heart Study • Nurses Health Study Prospective Cohort Study Framingham Heart Study: • Framingham, MA (1948): 5,000 of the 30,000 town residents ages 30 to 59 years of age without established coronary disease participated. • “Exposures” include smoking, obesity, elevated blood pressure, high cholesterol, physical activity, and others. • “Outcomes” include development of coronary heart disease, stroke, gout, and others. Prospective Cohort Study Framingham Heart Study: • Outcome events were identified by examining the study population every 2 years, and by daily surveillance of hospitalizations in the only hospital in Framingham, MA. • Participants followed for more than 30 years. • Study has made fundamental contributions to our understanding of the epidemiology of cardiovascular disease. Prospective Cohort Study Framingham Heart Study: • More than 200 published reports. • Unfortunately, Framingham, MA is almost exclusively Caucasian. Prospective Cohort Study Nurses Health Study: • In 1976, > 120,000 married female nurses ages 30 to 55 in one of 11 U.S. states participated. • At 2-year intervals, follow-up questionnaires were completed on development of outcomes and exposure information. • “Exposures” include use of oral contraceptives, post-menopausal hormones, hair dyes, dietary fat consumption, age at first birth, and others. Prospective Cohort Study Nurses Health Study: • “Outcomes” include heart disease, various types of cancer, and others. • Many new “exposures” have been added to the biennial questionnaires (e.g. electric blanket use, selenium levels, etc.). Prospective Cohort Study Follow-up Issues: • Major challenge is to collect follow-up data on every study subject. • Loss to follow-up is a major source of bias and is related to: • --- Length of follow-up --- Monitoring methods used in the study Multiple sources of information can be used to obtain complete follow-up information. Prospective Cohort Study Sources of Error (Bias): Loss to Follow-up: • If large (e.g. > 30%), validity of study results may be severely compromised. • Probability of loss to follow-up may be related to exposure, disease, or both – this may lead to a biased exposure/disease estimate. • Can use “sensitivity” analysis to estimate potential effect of subjects lost to followup. Prospective Cohort Study Sensitivity Analysis: General Definition: • Substitution of a value or range of values to evaluate the robustness of study findings, while taking into account the potential impact of study limitations. For example, how might the final outcome of the analysis change when taking into account loss to follow-up? Prospective Cohort Study Sensitivity Analysis (Example): Prospective cohort study of lumber mill occupation and low back pain. 1,000 subjects recruited --- 518 exposed (lumber mill workers) --- 482 non-exposed (other workers) 100 of 1,000 lost to follow-up --- 60 exposed, 40 non-exposed Sensitivity Analysis D+ D- E+ 54 404 458 IncidenceE- = 44/442 = 0.100 E- 44 398 442 RR = 0.118 / 0.100 = 1.18 900 95%, C.I. = (0.81, 1.72) IncidenceE+ = 54/458 = 0.118 Possible Scenarios from loss to follow-up: Scenario 1 (Extreme): All 60 exposed lost to follow-up experienced low back pain, whereas the rate in the 40 non-exposed lost to follow-up was same as those with complete follow-up. Sensitivity Analysis Scenario 1 Actual E+ D+ 54 D404 458 E- 44 398 442 900 IncidenceE+ = 54/458 = 0.118 IncidenceE- = 44/442 = 0.100 RR = 0.118 / 0.100 = 1.18 95%, C.I. = (0.81, 1.72) E+ D+ 114 D404 518 E- 48 434 482 1000 IncidenceE+ = 114/518 = 0.220 IncidenceE- = 48/482 = 0.100 RR = 0.220 / 0.100 = 2.21 95%, C.I. = (1.61, 3.03) Sensitivity Analysis Possible Scenarios from loss to follow-up: Scenario 2 (Possible): The incidence of the 60 exposed lost to follow-up is twice the rate of the incidence of the 40 non-exposed lost to follow-up. The incidence of the 40 non-exposed lost to follow-up is the same as the incidence of the 442 non-exposed in the study. Sensitivity Analysis E+ Actual D+ D54 404 458 E+ E- 44 442 E- 398 900 IncidenceE+ = 54/458 = 0.118 IncidenceE- = 44/442 = 0.100 RR = 0.118 / 0.100 = 1.18 95%, C.I. = (0.81, 1.72) Scenario 2 D+ D66 452 48 518 434 482 1000 IncidenceE+ = 66/518 = 0.127 IncidenceE- = 48/482 = 0.100 RR = 0.127 / 0.100 = 1.28 95%, C.I. = (0.90, 1.82) Sensitivity Analysis Possible Scenarios from loss to follow-up: Scenario 3 (Possible): The incidence of the 60 exposed lost to follow-up is half the rate of the incidence of the 40 non-exposed lost to followup. The incidence of the 40 non-exposed lost to follow-up is the same as the incidence of the 442 non-exposed in the study. Sensitivity Analysis E+ Actual D+ D54 404 458 E+ E- 44 442 E- 398 900 IncidenceE+ = 54/458 = 0.118 IncidenceE- = 44/442 = 0.100 RR = 0.118 / 0.100 = 1.18 95%, C.I. = (0.81, 1.72) Scenario 3 D+ D57 461 48 434 518 482 1000 IncidenceE+ = 57/518 = 0.110 IncidenceE- = 48/482 = 0.100 RR = 0.127 / 0.100 = 1.11 95%, C.I. = (0.77, 1.59) Sensitivity Analysis Actual Scenario 1 RR = 1.18 RR = 2.21 95%, C.I. = (0.81, 1.72) 95%, C.I. = (1.61, 3.03) Scenario 2 Scenario 3 RR = 1.28 95%, C.I. = (0.90, 1.82) RR = 1.11 95%, C.I. = (0.77, 1.59) With 10% loss to follow-up, the observed risk ratio estimate of 1.18 appears to be robust with regard to possible (but not extreme) impact of loss to followup (e.g. Scenarios 2 and 3). Sensitivity Analysis Note: Even if loss to follow-up is low (e.g. 10%), if the incidence is very low in the observed study population (e.g. < 5%), yet relatively high in those lost to follow-up (e.g. > 15%), the observed point estimate may be severely biased….. e.g. because of loss to follow-up, you missed “all of the action” (where the cases occurred). Prospective Cohort Study Sources of Error (Bias): Misclassification of Exposure and/or Outcome: • Random (non-differential) misclassification • Non-random (differential) misclassification • Can use “sensitivity” analysis to estimate potential effect of postulated degree(s) of misclassification. Prospective Cohort Study Non-Participation: • Participants often differ from non-participants in important ways. • A “valid” result will not be affected by nonparticipation, although generalizability may be affected. • True exposure/disease relationship will be biased if non-participation is related to both the exposure and other risk factors for the outcome under study. Review of Recommended Reading CRP, LDL, and First CVD Events --- Prospective cohort study within an randomized trial of 27,939 apparently healthy American women (1992-95) in the Women’s Health Study (WHS). --- WHS is an ongoing evaluation of aspirin and vitamin E for primary prevention of CVD events among women >45 yrs. --- Before randomization, blood samples collected and stored with assays performed for CRP and LDL. --- First CVD event defined as non-fatal MI, non-fatal ischemic stroke, coronary revascularization, and death from cardiovascular causes. --- Participants followed for average of 8 years. --- Analyses conducted separately by HRT status. Discussion Question 1 Interpret results from figure 1 and table 2. Among CRP and LDL cholesterol at baseline, which variable seems to best predict the risk of cardiovascular disease over 8 years of follow-up? Source: NEJM 2002; 347:1557-1565. Discussion Question 2 Interpret the results from table 3. For risk estimates associated with CRP, is there evidence of effect measure modification by hormone replacement therapy status? What about the risk estimates for LDL? Discussion Question 3 Interpret the results from figure 3 and 4. Do baseline levels of CRP and LDL cholesterol independently predict subsequent cardiovascular risk, or do they simply measure a common (shared) domain of risk?