Introduction to Cohort Studies Facilitator G. Kwesigabo Department of Epidemiology and Biostatistics, SPHSS, MUHAS Session objectives At the end of this session, the participants should be able to: • Know the history and understand the meaning and important elements of cohort studies • Comprehend the key issues in the design of cohort studies including; – – – – • Assembling the study populations, Ascertainment of exposures, Ascertainment of outcomes, Data analysis and assessing associations between the exposure and disease/outcomes. Discuss the strength and limitations of cohort studies. Cohort Studies • One of the observational studies that investigates the relation between exposures and outcomes (diseases) • Through implementation of studies that answer a specific research question regarding risk factors, it is possible to identify risk and protective factors and eventually causes of diseases. What is a cohort? • Cohort: Latin word for 1 of the 10 divisions of a Roman legion (1/10 of a legion) In research methods: • A group of individuals – Sharing a common experience – Followed-up for a specified period of time • Examples – Birth cohort – Workers at a chemical plant – 2020/2021 students at the School of Law UDSM Design Principle • Individuals / subjects are defined on the basis of presence or absence of exposure to a suspected risk factor for the outcome / disease being studied • At the time exposure status is defined, all potential subjects must be free from the disease/outcome under investigation • All eligible participants are then followed over a period of time to assess the occurrence of that outcome • The Incidence of outcome in the exposed group is compared to that in the unexposed group to determine the risk of being exposed Cohort study Population at risk Exposed Disease among exposed? Usually prospective and Not Exposed Disease among non-exposed? Controls Exposure Types of cohort studies Exposed Retrospective cohort study Prospective cohort study Unexposed Determine PAST Incidence NOW Ambidirectional FUTURE Types of cohort studies • Classified as either prospective or retrospective depending on the temporal relationship between initiation of the study and occurrence of disease or outcome • Retrospective: Both exposure and outcome have already occurred Retrospective Cohort Study Exposure occurrence Disease occurrence Study starts ill + + exp - Selection based on exposure + exp rétrospectives assessment of disease Chernobyl, Industrial accidents, Flood victims Prospective: Exposure is present but not the outcome Thus after the selection of the cohort, participants are followed into the future to assess incidence rates of disease among both groups i.e. the exposed and unexposed Prospective Cohort Study Study starts Exposure occurrence Disease occurrence ill + + exp exp Selection of population Prospective assessment of exposure and disease - + - Time Ambidirectional: • Data are collected both retrospectively and prospectively on the same cohort • This design is useful for exposures having short and long term effects eg chemicals that may increase the risk of birth defects within a few years of exposure and cancer risk after one or two decades • Another example could be exposure to Poisonous gases or chemicals e.g. in mines, can study immediate effects eg paralysis by a retrospective design and follow up the exposed for long term effects Selection of an exposed population Things to consider: • Common exposures: use a random sample of population • For rare exposures: use specific occupational groups – miners, asbestos workers, workers in cotton processing plants etc • Need to obtain accurate exposure and follow-up information from the study participants – lack of accurate and complete information affects the validity of the study • Feasibility of follow-up of the population to be studied ? High loss to follow up • The cohort selected should assure sufficient number of outcomes Comparison group, is necessary in order to: Allow the evaluation of whether the frequency of disease or outcome in the exposed group is different from that which would have been expected based on the experience of a comparable group of individuals who are not exposed to the factor under study Selection of a comparison group • The exposed and none exposed group should be as similar as possible with respect to factors that may be related to the disease (outcome) except the exposure under investigation • So that if there is no association between exposure and disease, the disease rates in the populations being compared will essentially be the same • Also ensure that the information that can be obtained from the non-exposed group is adequate for comparison with the exposed population Issues in the Design of Cohort Studies Sources of Data Exposure Information • Pre-existing records – – – – Consider availability for much of cohort May be Inexpensive Objective, bias-free categorisation of exposure status But – insufficient detail and no information on potential confounders – information may have been kept for another purpose • Information from study subjects – Information of data not routinely collected – Questionnaires/interviews – Ascertainment of exposure must be comparable for all Issues in Design of Cohort Studies Sources of Data Outcome Information • Obtain complete, comparable, unbiased information • Death certificates (potential bias when cause-specific mortality) • Medical records, Medical Aid schemes, etc. • From study subjects • Periodic direct medical examinations Apply equally to exposed and non-exposed Bias in Cohort studies 1. Loss to follow up (Attrition) can not get in touch with some study participants • Failure to ascertain outcome data is the major source of potential bias • The longer the follow-up period the more difficult to ensure complete data • If lost to follow-up is large (eg, 30-40%) ? Validity ? • Loss to follow-up may be differential – more on one arm compared to the other How to minimize the loss to follow up (Attrition) During enrollment (i) Include populations/subjects that are less mobile, Exclude subjects likely to be lost i.e • Planning to move • Non committal (ii) Obtain information to allow future tracking:– Collect subjects contact information (email address, phone numbers, GPS, and mailing address) – Identification number and tracking tags. During follow up Maintain periodic contact by telephone, physical visiting etc.(cost implication) 2. Participation bias Accepting participants may differ from nonparticipants 3. Misclassification bias • Misclassification due to exposure status is common (smoker may be reported as non smoker) • Can be random (equally for exposed and unexposed) or non-random 4. Ascertainment bias • Bias in ascertaining the outcome. • Outcomes should be ascertained equally in both the exposed and non – exposed groups Measure of effect (effect of exposure) • In order to establish the association between exposure and disease, one has to estimate and compare measures of disease frequency among the exposed compared to the nonexposed group • The association is determined by calculating the Relative Risk of developing the disease being investigated Distribution of illness according to exposure in a cohort study ILL Exposed a Not exposed c NOT ILL b Cumulative Incidence a+b a a+b Relative risk d c+d Incidence exposed = _____________________ Incidence not exposed c c+d Cohort study: Exercise and heart disease No exercise (exposed) >Half hr exercise daily ILL NOT ILL 30 10 5 25 Relative Risk (RR) = 30 /40 5 /30 Risk 40 30 40 30 5 30 = 4.5 Cohort study: Exercise and heart disease ILL >Half hr exercise daily Exposed No exercise NOT ILL Risk 5 30 25 10 Relative Risk (RR) = 5 /30 30/40 30 40 5 30 30 40 = 0.2 (80% reduction in risk of heart disease) 1-0.2 Incidence density Events / Person time of observation Usually expressed in person years of observation (per 1000 yrs of observation) Relative risk Incidence density in the exposed pop -------------------------------------------------Incidence density in the Non exposed pop Interpretation of the RR • • • • 1 Null value > 1 may be associated with risk < 1 may be protective Rule out the possibility of chance by calculating the 95% CI around the RR Interpretation of Relative Risk cont.. • The individuals who do not exercise are at 4.5 times at risk of developing Heart disease compared to those who exercise. • Test of significance for the estimated association – 95% Confidence Interval around the RR. Interpret the following: a. 0.4 (0.1-0.8) b. 1.6 (0.9-1.8) c. 2.2 (1.5-2.9) (If the confidence interval includes the Null value of No Association – implies a statistically non significant association) Rate difference • Rate difference is the difference between the incidences rates in the exposed and unexposed groups RD= Incidence exposed - Incidence unexposed • RD - Measure excess risk of outcome (disease) attributable to the exposure Attributable risk percent (AR%) • Gives proportion of disease attributable to the exposure • AR%= (Incidence exposed - Incidence unexposed ) _______________________________ Incidence (exposed) • Incidence of lung cancer among smokers 70/7000 = 10 per 1000 • Incidence of lung cancer among non-smokers 3/3000 = 1 per thousand RR = 10 / 1 = 10 (Smokers are at 10 times at risk of developing lung cancer compared to people who do not smoke) AR = 10 – 1 / 10 X 100 = 90 % (90% of the cases of lung cancer among smokers are attributed to their habit of smoking) If smoking is eliminated, 90% of the cases will be eliminated Advantages of cohort studies: • Can elucidate temporal relationship between exposure and disease. Exposure comes before the disease develops. • Permits direct measurements of exposure specific incidence of the disease. • Allows for evaluation of multiple outcomes of the same exposure. • Particularly useful when exposure is rare. • Less prone to selection bias for prospective cohort studies. Limitations of cohort studies: • Expensive and time consuming and therefore takes a long time before completion. • Liable to attrition or loss to follow-up among subjects. • Inefficient for rare diseases, need for huge sample sizes • Not suitable for diseases with long latency period such as most cancers.