Case-control study Yimin Zhu Dept. Epidemiology & Biostatistics School of Public Health, ZJU Dec 4, 2023 Outline Principle of case-control study Design of case-control study Data analysis Strengths and limitations Methods in clinical epidemiology descriptive study cross-sectional study Survey the prevalence generate a hypothesis ecological study observational study investigator observes exposure No intervention experimental study investigator assigns exposure intervention case-control study analytical study examine a hypothesis cohort study clinical trial prove a hypothesis field trial Question In the early 1940s, Alton Ochsner observed that most of the patients on whom he was operating for lung cancer had a history of cigarette smoking. Then he hypothesized that cigarette smoking might linked to lung cancer. Based only on his observations in cases of lung cancer, was this conclusion valid? How to test this hypothesis? 1. Principle of a case-control study Principle of a case-control study To test association between a specific exposure (smoking) and a certain disease (lung cancer): To recruit two groups of subjects, one group (cases group) with a particular health problem or outcome, and the other (control) group without this outcome. To investigate the past exposure status (before incidence of the disease in case subject) in case and control subjects. To compare the rates/strength of past exposures between case and control groups, and then determine if the exposure could account for the health condition of the cases. Principle Exposure rate a Exposure status in the past Disease yes cases A/(a+b) b no c yes Target Population c/(c+d) controls d no exposure Non-exposure Overall analysis of a case-control study exposure non-exposure Rate(%) case A B A/(A+B) Control C D C/(C+D) If, A/(A+B) ≠ C/(C+D) and this difference arrive statistically significant, Then, we can inferred that the exposure was associated with the risk of interested disease. The smoking status in lung cancer patients and control subjects smoker nonsmoker total Rate(%) Lung cancer patients 688 21 709 97.0 Controls 650 59 709 91.7 n )2 n ( ad bc 2 χ2 19.13 a b(d c)(a c)(b d) 2 χ > 3.84, P < 0.05; Here, P < 0.001 χ2 > 6.63, P < 0.01; basic characteristics of a case-control study observational / analytical study retrospective design, effect to cause, lacks temporality, uses comparison group case Determine the past exposure status Direction of inquiry control -7 -6 -5 -4 -3 -2 -1 0 Start point of study 2. How to design a case-control study key steps for case-control study define and recruit subjects (cases and controls) investigate the exposure infromation of subjects compare the expoure rates between case and control groups Selection of cases Definition: the subjects with a specific disease/ interested outcome. Clear definition of the disease Setting diagnostic criteria: inclusion and exclusion criteria. to avoid misclassification Example of case selection Selection of cases Representativeness: one sample of case population/ be representative of all cases. population μ, ρ, RR, or HR? sampling x sample Sources of cases: from hospital or community Cases may be recruited from hospitals, clinics, or disease registries, screening/community, etc. newly diagnosed cases people who may have had the disease for some time Types of cases: Incident, prevalent and death cases. Incident cases are preferable to prevalent/ death cases 1. low recall bias 2. better representativeness 3. reducing the effect by survival factors 4. high cooperation 5. less probability of exposure change Control selection Definition: the subjects without the disease of interest: healthy persons or patients with other diseases. control ≠ healthy persons "other diseases", no common risk factors with interested disease Representativeness: Controls should be a representative subgroup of members of the same base group that give rise to cases. target population Multiple controls: to increase the statistical power when cases (rare disease) are difficult to obtain. n1 cases / n2 controls (n2 > n1) Multiple control groups: Using more than one control group increases the consistency to the results. control grp1 control grp2 case grp control grp3 Sources of controls 1. Population of defined area 2. Hospital patients (with other diseases) 3. Probability sample of total population 4. Neighbors, friends or associates of cases 5. Siblings, spouses or other relatives Two types of case-control study based on the sources of subjects Population-based case-control study high representativeness and easy to extrapolate to target population, difficult to recruit Hospital-based case-control study easy to recruit subjects, lower representativeness Comparability: similar distribution in other respects between case and control subjects reducing confounding bias Matching: the process of selecting controls so that they are similar to the cases in certain characteristics such as age, sex, race, socioeconomic status, and occupation. matching factors or confounding factors Two types of matching 1. frequency matching or group matching 2. individual matching frequency matching To select the control subjects according to the distribution of potential confounding factors in case group, and then the proportions of controls with certain factors weres similar with the proportion of cases. same distribution of age, sex, race with cases Case population N1 cases N2 controls Grouped case-control study Control population Data in the grouped case- control study exposure Nonexposure total % case a b n1 a/(a+b) control c d n2 c/(c+d) example Green tea consumption and the risk of endometrial cancer: a population-based case-control study in urban Shanghai 0bjective: To assess the effect of tea consumption on the risk of endometrial cancer. Methods: with a population based case-control study conducted in urban Shanghai.Face-to-face interviews were completed for 995 incidence cases aged 30-69 from Jan 1997 to Dec 2002 and 1087 female controls that frequency-matched to cases on age. Unconditiona1 logistic mode1 was used for analysis Age distribution between case and control groups case control total <45 104(10.45) 124(11.41) 228 45- 163(16.83) 172(15.82) 335 50- 230(23.12) 255(23.46) 485 55- 156(15.68) 165(15.18) 321 60- 160(16.08) 186(17.11) 346 >65 182(18.29) 185(17.02) 367 P value 0.918 Control group was comparable to case group in age distribution. Individual matching selection of control subject according to the characteristics of individual case cases controls Pair No 1 Pair No 2 Paired case-control study: 1 case: 1 control; Matched case-control study: 1 case: R controls R >1, increase the efficiency of study when the number of cases is limited. R 1 2 efficiency 1 1.33 3 4 5 6 1.50 1.60 1.67 1.71 ∞ 2.00 Optimal for the etiological study for rare diseases Data in Paired case-control study Type of pairs Exposure status case control Num of pairs 1 2 + + + - A C 3 4 - + - B D case + - + A B - C D control example Vagina cancer of the is rare, only 2% incidence of female reproductive system tumors while adenocarcinoma vagina cancer only 5-10% of Vagina cancer. Adenocarcinoma vagina cancer usuallyoccurs as epidermoid carcinoma in women over the age of 50 years. However, between 1966 and 1969, seven girls 15 to 22 years of age with adenocarcinoma were seen at the Vincent Memorial Hospital. Young patients had been recorded rarely before. Different distribution of histology, onset of age, cluster of disease. Adenocarcinoma of the vagina and Maternal ingestion of stilbestrol A matched case- control study was used to uncover the risk factors of this cancer. Eight patients were recruited as cases (another case from a another Boston hospital) Four matched controls were selected for each case and recruit requirements were females born within five days and on the same type of service room as the case Dozens of variables were investigated by personal interview for cases and controls. main findings Maternal smoking Maternal age Bleeding in this pregnancy Any prior pregnancy loss Estrogen given in this pregnancy Breast feeding Intra uterine Xray exposure No case control case control case control case control case control case control case control 1 25 32 yes 2/4 no 0/4 yes 1/4 yes 0/4 no 0/4 no 1/4 2 30 30 yes 3/4 no 0/4 yes 1/4 yes 0/4 no 1/4 no 0/4 3 22 31 yes 1/4 yes 0/4 no 1/4 yes 0/4 yes 0/4 no 0/4 4 33 30 yes 3/4 yes 0/4 yes 0/4 yes 0/4 yes 2/4 no 0/4 5 22 27 yes 3/4 no 1/4 no 1/4 no 0/4 no 0/4 no 0/4 6 21 29 yes 3/4 yes 0/4 yes 0/4 yes 0/4 no 0/4 no 1/4 7 30 27 no 3/4 yes 0/4 yes 1/4 yes 0/4 yes 0/4 no 1/4 8 26 28 yes 3/4 yes 0/4 yes 0/4 yes 0/4 no 0/4 yes 1/4 tota l 7/8 21/32 5/8 1/32 6/8 5/32 7/8 1/32 3/8 3/32 χ2 0 P OR 4.52 7.16 23.22 <0.05 <0.01 <0.0001 8.0 10.5 28.0 2.35 0.20 10.0 1/8 4/32 0 Conclusion: Maternal ingestion of stilbestrol during early pregnancy appears to have enhanced the risk of vaginal adeno-carcinoma developing years later in the offspring exposed. The main roles of case-control study: to screen the risk factors from multiple candidate factors to explore the causes of for common and rare diseases Data collecting (expsoure measurement) previous exposure information before incidence Methods: history records/ measurements on the environmental, biological samples/interviews/health check Exposure is usually an estimate unless past measurements are available Each factor should have a clear definition Never smokers were defined as subjects who had never smoked or had smoked fewer than 100 cigarettes in his or her lifetime. Former smokers reported a history of smoking but had stopped at least 1 year before being diagnosed with lung cancer (or 1 year before enrollment in the study, for control subjects). Current smokers were currently smoking or had stopped smoking less than 1 year before being diagnosed with lung cancer (or less than 1 year before enrollment in the study, for control subjects). 3. interview with questionnaire. face to face telephone internet To avoiding recall bias! recall bias mothers with mothers without congenital congenital malformations malformations assume that true incidence of infection (%) 15 15 infection recalled (%) 60 10 infection rate ascerteained by interview (%) 9.0 1.5 P value 1.00 <0.01 Sample size estimation Adequate Sample size and statistical power For grouped case-control study, the sample size was calculated as the following formula: N ( K 2 P Q K P1Q1 P0Q0 ) 2 ( P0 P1 ) 2 , β: the Type I and Type II error rates, P0, P1: the predicated exposed rate in case and control group q0 = 1 p0,q1=1p1; P=(p0+p1)/2, q=1p Data analysis Grouped case-control study exposure Nonexposure total % case A B n1 A/(A+B) Control C D n2 C/(C+D) 1. Statistical difference of exposure between case and control groups? n )2 n ( ad bc 2 χ2 a b (d c)(a c)(b d) 2 >3.84, P<0.05; 2 >6.63, P<0.01 2. Calculate the Strength of association + - case A B Control C D Odds ratio, OR exposure probability for case = A/(A+B) Odd exposure for case = exposure probability/ un exposure probability =A/(A+B) / B/(A+B)=A/B Similarly, Odd exposure for control=C/D Odds ratio (OR) is defined as the odd of exposure for cases divided by the odd of exposure for controls OR=(A/B) / (C/D)=AD/BC OR estimates the relative risk (RR) OR=1, Null value 0 0.5 Protective factors 1 2 3 Risk factors OR<1, Exposure reduces disease risk (Protective factor) OR>1, Exposure increases disease risk (Risk factor) OR=1, no association between exposure and disease the association between OR value and the strength of association OR association 0.9~1.0 1.0~1.1 no 0.7~0.8 1.2~1.4 weak 0.4~0.6 1.5~2.9 median 0.1~0.3 3.0~9.0 strong <0.1 >10.0 very strong 95% confidence interval, 95%CI OR 11.96 2 The smoking status in lung cancer patients and healthy controls Lung cancer Control smoker non-smoker n 688 21 709 650 59 709 P<0.01 The risk for lung cancer in smokers was nearly 1.97 times greater than that in the non-smokers. Smoking was associated with increased risk for lung cancer Dose- response association Average Daily Cigarettes Lung Cancer Patients Control Group OR(95%CI) 0 7 61 1 1–4 55 129 3.7(1.6-8.6) 5–14 489 570 7.5(3.4-16.5) 15–24 475 431 9.6(4.4-21.2) 25–49 293 154 16.6(7.4-37.1) 50+ 38 12 27.6(10.0-76.2) Total 1357 1357 stratification analysis esophageal cancer control total drinking a b a+b no-drinking c d c+d a+c b+d n total OR=ad/bc confounding effect by smoking? Smoking subjects non-smoking subjects esophageal cancer control total drinking A1 B1 A1+b1 A2 B2 nodrinking C1 D1 C1+d1 C2 D2 total A1+c1 B1+d1 N1 A2+c2 B2+d2 OR1=a1×1/(b1×c1) esophageal control cancer total A2+b2 C2+d2 N2 OR2=a2×d2/(b2×c2) If OR ≠ OR1 or OR2,smoking induces confounding effect Paired case-control study Individual matching control χ2 Exposure status Type of pairs case control Num of pairs 1 + + A 2 + - C 3 - + B 4 - - D + case + A B - C b c D 2 1 bc OR= c / b 95% CI for OR OR (11.96 / 2 ) Paired case-control study of esophageal cancer To examine the association between smoking and esophageal cancer with paired case-control design, How do you select control subjects? case control + - + 55 6 - 26 6 Advantages of case-control study Relatively take less time, sample size and expense, easy to complete, no follow-up time is involved (relatively less sample size Most efficient design for rare diseases (enlarge the number of the control subjects multiple risk factors can be studied simultaneously. So it is usually used to screen of the risk factors of a specific complex disease such as cancer diabetes, etc. Limitations of case-control study Sometimes time sequence between exposure and outcome is uncertain No direct estimate proxy for risk or rate ratio Possible bias in selection of subjects and measurement of risk factors (recall bias) what's the differences between case control study and cohort study? What's the difference between case-control study and cohort study? case-control study cohort study classification of methods observation/ analytical observation/ analytical intervention No No criteria of classification disease? exposure? groups of comparison case/control exposure/unexposure temporality retrospective prospective Causal sequence from cause to effect from effect to cause comparison? exposure rates between case and control groups incidence rates between exposure and unexposure groups index of association odds ratio relative risk purpose in causal exploration test the hypothesis; rare disease; multipe risk factors test the hypothesis; one risk factors; common disease and common factors homework Children with higher birth weights (birth weight greater than 8 lbs) are at increased risk for certain childhood cancers. To test this hypothesis with a case control study, Type of pairs 1 2 How to select the case/ control subjects? What’s exposure in this study? What data should be collected? How to 3 4 Exposure status case control Num of pairs + + - + + - 8 18 7 38 get these information? Calculate and interpret the main results: +: birth weight greater than 8 lbs -: birth weight lower than 8 lbs PLS submit your homework before next Monday!