CASE-CONTROL STUDIES Nigel Paneth EVOLUTION OF THE CASECONTROL STUDY 1. CASE What is a case? Consolidating several different signs and symptoms into "caseness" was a key development in medicine. (for more details see: Paneth N, Susser E, Susser: The early history and development of the case-control study. Social & Preventive Medicine 2002; 47: 282-288 and 359-365) 2. CASE-SERIES •Aggregating many individual cases into a group, and describing the features of the group, began in earnest in the 18th century. •Key figure - PCA Louis in France. "The numerical method". •Currently perhaps the single commonest kind of medical article. 3. CASE-CONTROL STUDY • In its simplest form, comparing a case series to a matched control series. • Possibly the first c-c study was by Whitehead in Broad Street pump episode, 1854 (Snow did not do a c-c study). • First modern c-c study was Janet LaneClaypon’s study of Breast cancer and reproductive history in 1926. • Four c-c studies implicating smoking and lung cancer appeared in 1950, establishing the method in epidemiology. FEATURES OF CASECONTROL STUDIES 1. DIRECTIONALITY: Outcome to exposure 2. TIMING: Retrospective for exposure, but case-ascertainment can be either retrospective or concurrent. 3. SAMPLING: Almost always on outcome, with matching of controls to cases TWO CHARACTERISTICS OF CASES 1. REPRESENTATIVENESS: Ideally, cases are a random sample of all cases of interest in the source population (e.g. from vital data, registry data). More commonly they are a selection of available cases from a medical care facility. (e.g. from hospitals, clinics) 2. METHOD OF SELECTION Selection may be from incident or prevalent cases: • Incident cases are those derived from ongoing ascertainment of cases over time. • Prevalent cases are derived from a cross-sectional survey. CHARACTERISTICS OF CONTROLS • Who is the best control? • Where should controls come from? • If cases are a random sample of all cases in the population, then controls should be a random sample of all non-cases in the population sampled at the same time (i.e. from the same study base) • But if study cases are not a random sample of the university of all cases, it is not likely that a random sample of the population of non-cases will constitute a good control population. THREE QUALITIES NEEDED IN CONTROLS • • • Key concept: Comparability is more important than representativeness in the selection of controls The control must be at risk of getting the disease. The control should resemble the case in all respects except for the presence of disease COMPARABILITY VS. REPRESENTATIVENESS Usually, study cases are not a random sample of all cases in the population, and therefore controls must be selected so as to mirror the same biases that entered into the selection of cases It follows from the above that a pool of potential controls must be defined. This pool must mirror the study base of the cases. STUDY BASE Therefore, imagining the study base is a useful exercise before deciding on control selection. The study base is composed of a population at risk of exposure over a period of risk of exposure. Cases emerge within a study base. Controls should emerge from the same study base, except that they are not cases. For example, if cases are selected exclusively from hospitalized patients, controls must also be selected from hospitalized patients. • If cases must have gone through a certain ascertainment process (e.g. screening), controls must have also. (e.g. mammogram-detected breast cancer) • If cases must have reached a certain age before they can become cases, so must controls. (thus we always match on age) • If the exposure of interest is cumulative over time, the controls and cases must each have the same opportunity to be exposed to that exposure. (if the case has to work in a factory to be exposed to benzene, the control must also have worked where he/she could be exposed to benzene) SIX ISSUES IN MATCHING CONTROLS IN CASE-CONTROL STUDIES 1. Identify the pool from which controls may come. This pool is likely to reflect the way controls were ascertained (hospital, screening test, telephone survey). 2. Control selection is usually through matching. Matching variables (e.g. age), and matching criteria (e.g. control must be within the same 5 year age group) must be set up in advance. 3. Controls can be individually matched or frequency matched INDIVIDUAL MATCHING: search for one (or more) controls who have the required MATCHING CRITERIA. PAIRED or TRIPLET MATCHING is when there is one or two controls individually matched to each case. FREQUENCY MATCHING: select a population of controls such that the overall characteristics of the group match the overall characteristics of the cases. e.g. if 15% of cases are under age 20, 15% of the controls are also. 4. AVOID OVER-MATCHING. match only on factors known to be causes of the disease. 5. Obtain POWER by matching MORE THAN ONE CONTROL PER CASE. In general, N of controls should be < 4, because there is no further gain of power above four controls per case. 6. Obtain GENERALIZABILITY by matching more than ONE TYPE OF CONTROL ADVANTAGES AND DISADVANTAGES OF C-C STUDIES Advantages: 1. only realistic study design for uncovering etiology in rare diseases 2. important in understanding new diseases 3. commonly used in outbreak investigation 4. useful if induction period is long 5. relatively inexpensive Disadvantages: 1. Susceptible to bias if not carefully designed (and matched) 2. Especially susceptible to exposure misclassification 3. Especially susceptible to recall bias 4. Restricted to single outcome 5. Incidence rates not usually calculable 6. Cannot assess effects of matching variables EXAMPLES OF PROBLEMS • Doll’s 1951 study of smoking and lung cancer. The problem was that the control population (lung diseases other than cancer) was biased in relation to the exposure. • McMahon’s 1981 study of coffee and pancreatic cancer. Problem was that some of the controls may have been biased in relation to the exposure, because gastrointestinal diseases were excluded from the control series, and these diseases might have people who reduced coffee intake on medical advice or because of symptoms. SOME IMPORTANT DISCOVERIES MADE IN CASE CONTROL STUDIES 1950's • Cigarette smoking and lung cancer 1970's • Diethyl stilbestrol and vaginal adenocarcinoma • Post-menopausal estrogens and endometrial cancer 1980's • Aspirin and Reyes syndrome • Tampon use and toxic shock syndrome • L-tryptophan and eosinophilia-myalgia syndrome • AIDS and sexual practices 1990's • Vaccine effectiveness • Diet and cancer BASIC ANALYSIS OF CASE CONTROL STUDIES FOR ONE CONTROL Data is expressed in a four-fold table, and an odds ratio is calculated (relative risks have no meaning here – why?). Cases Controls Exposed a Unexposed c b d OR = ad/bc PAIRED ANALYSIS FOR ONE CONTROL Data is expressed in a four-fold table, and the number of concordant and discordant pairs are calculated. Test is McNemar’s chi squared test for paired data. Case Exposed Unexposed Exposed Both Mixed Controls Unexposed Mixed Neither PAIRED ANALYSIS FOR ONE CONTROL Case Exposed Unexposed Exposed r s Controls Unexposed t u McNemar chi2 = (t + s)2 (t – s) MORE POINTS ABOUT CASE-CONTROL ANALYSIS • The odds ratio is a good estimate of the relative risk when the disease is rare (prevalence < 20%). • Can be extended to N > 1 controls. • statistical testing is by simple chi-square (unmatched analysis) or by McNemar’s chi square (matched-pairs analysis). • Can be extended to multiple strata (Mantel-Haenzel chi-square) THEORETICAL FOUNDATION of case-control studies per McMahon and Trichopoulos 1. "Case-control studies should be viewed as efficient sampling schemes of the disease experience of the underlying open or closed cohorts" (McMahon & Trichopoulos, p. 230) 2. "The exposure odds ratio derived from case-control studies equals the disease odds ratio derived from cohort studies" (p.231) 3.The incidence rate ratio: Xe divided by Xo Te To can also be written as: Xe divided by Te Xo To 4. "In a case-control study based on a dynamic population, Xe and Xo (exposed and unexposed cases) are directly ascertained, and the ratio Te/To can be estimated in an unbiased way not dependent on any rare disease assumption by the ratio of exposed versus unexposed prevalent individuals at risk in the study base (the total study period cancels out). 5. "any particular group of prevalent individuals at risk for the disease in the source population during the study period (i.e. the study base) that correctly reflects the ratio of exposed to unexposed person-time in this population over this period can be used for this purpose." 6. "To the extent that Ye/Yo (the exposure odds among the controls) is an unbiased estimate of Te/To, controls may be viewed as reflecting the person-time by exposure status," (p.231)