Issues in case-control studies Kwang Hyuck Lee M.D., Ph.D. lkhyuck@gmail.com Divison of gastroenterology Department of Medicine Samsung Medical Center Sungkyunkwan University School of Medicine • Pancreas and biliary tract SMC pancreas biliary tract Biliary tract and pancreas Managing with specialized Endoscopy 복부 초음파 Vs 위 내시경검사 내시경 초음파를 이용 조직검사 (EUS: endoscopic ultrasound) Endoscopic ultrasound in Pancreaticobiliary disease ERCP (endoscopic retrograde cholangiopancreatography) ERCP Endoscopic Retrograde CholangioPancreaticography Why do medical doctors have to learn epidemiology? Graduate school degree Investigation journal Academic position Interested in a research Presenter’s Name Ability to Date Do a case-control study Evaluate other papers properly Case-control study – historical synonyms Retrospective study Trohoc study Case comparison study Presenter’s Name Case compeer study Date Case history study Case referent study 11 Case Control Study Disease Case Exposed Yes No Yes No A1 B1 A B0 Presenter’s Name 0 Control Date A1 B0 OR (cross product ratio) A0 B1 생체 간이식 후 간수치 상승 환자에서 담도 협착의 조기 발견과 관련된 요인 Presenter’s Name Date 오초롱, 이광혁, 이종균 , 이규택 , 권준혁*,조재원*, 조주희** 성균관대학교 의과대학, 삼성서울병원 소화기내과, 이식외과*, 암교육센터** 연구목적 생체간이식(LDLT) 후 발생하는 담도 합병증 가장 좋은 치료인 내시경적 치료 성공률 : 50% 전후 담도 합병증을 조기에 발견하여 내시경적 배액술을 시행하면 성공률이 높다. LDLT 후 간 기능 이상 소견을 보이는 환자 중에 담도 합병증을 예측할 수 있는 요인을 찾고자 하였다. 대상 및 방법 기간 및 대상 환자 2006년 1월부터 2008년 12월 생체간이식을 받은 환자 수술 후 회복된 간기능이 다시 악화되었던 환자 duct to duct 문합 환자만 포함(hepaticojejunostomy 환자는 제외) 조사한 항목 기저질환, 증상 간기능 검사 수술기록 영상의학검사 분석 group LDLT 후 간수치가 재상승한 환자를 대상으로 group 을 나눔 (상승 기준 : AST>80, ALT>80, ALP>250 or bilirubin>2.2) Group A : ERCP가 필요한 환자 Vs ERCP 필요하지 않은 환자 Group B : 문합부 담도협착 환자 Vs 거부반응 환자 Group C : CT 상 협착소견이 없었던 환자 중에 ERCP가 필요한 환자 Vs 필요하지 않은 환자 LDLT patients during 3years : n=213 Patients with LFT elevation : n=120 Analysis group A need ERCP n=74 not need ERCP n=46 Analysis group B stricture 58 rejection 23 leakage 13 infection 7 stone 3 HCC 5 viral reactivation 3 vessel stenosis 3 etc 5 Analysis group C CT(-) need ERCP : 32 CT(-) not need ERCP : 40 Case-Control Study or not? Presenter’s Name Date 19 Presenter’s Name Date 20 Presenter’s Name Date Brock MV, et al. N Engl J Med 2008;358:900-9 21 Conducting case-control studies Case and Control selection Exposure measurement Presenter’s Name Odds ratio Date Research New Question ?? Method Clinical study Translational study Laboratory study Presenter’s Name ClinicalDate study Observational studies • Case-control study Vs Cohort study Randomized controlled trial Why case-control studies? New question of interest Cohort study with the appropriate outcome or exposure ascertainment does NOT exist Need toPresenter’s initiate a new study Name Date Do you have the time and/or resources to establish and follow new cohort? 24 Case control study ?? High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level ResultPresenter’s Name Date • Negative • Positive 25 Impetus for case-control studies : EFFICIENCY May not have the sufficient duration of time to see the development of diseases with long latency periods. May not have the sufficiently large cohort Presenter’s Name to observe outcomes of low incidence. Date NOTE: Rare outcomes are not necessary for a case-control study, but are often the drive. 26 Presenter’s Name Date 27 Efficiency of case-control study Do maternal exposures to estrogens around time of conception cause an increase in congenital heart defects? Assume RR = 2, 2-sided α = 0.05, 90% power Cohort study: If I0 = 8/1000, I1 = 16/1000, would need 3889 exposed and 3889 unexposed Presenter’s Name mothers Date Case-control study: If ~30% of women are exposed to estrogens around time of conception, would need 188 cases and 188 controls Schlesselman, p. 17 28 Strengths of case-control study Efficient – typically: Shorter period of time Not as many individuals needed Cases are selected, thus particularly good for rare diseases Presenter’s Name Date Informative – may assess multiple exposures and thus hypothesized causal mechanisms 29 Learning objectives Exposure Selection of cases and controls Bias Selection, Recall, Interviewer, Information Odds ratios Presenter’s Name MatchingDate Nested studies Conducting a case-control study DCR Chapter 8 30 Exposure ascertainment – examples Active methods Questionnaire (self- or interviewer- administered) Biomarkers Passive methods Presenter’s Name Medical records Date Insurance records Employment records School records 31 Exposure ascertainment issues Establish biologically relevant period Measurement occurs once at current time Repeated exposure Previous exposure Measure of exposure occurs after outcome Presenter’s Name has developed Date Possibility of information bias Possibility of reverse causation (outcome influences the measure of exposure) 32 Is it possible in case-control study? – relevant period Presenter’s Name Date Yesterday smoking and radiation Cancer risk 33 Information bias: recall bias Mothers of babies born with congenital malformations more likely to recall (accurately or “over-recall”) events during pregnancy such as illnesses, diet, etc. Presenter’s Name Date 34 Possibility of reverse causation High cholesterol Myocardial infarction MI (+) case MI (-) control Cholesterol level Name ResultPresenter’s ? Date MI Cholesterol level decrease Measure cholesterol after MI 35 Case selection – basic tenets Eligibility criteria Characteristics of the target and source population Diagnostic criteria Definition of a case: misclassification Presenter’s Name Date Feasibility 36 Source populations – samples Health providers: clinics, hospitals, insurers Occupations: work place, unions Surveillance/screening programs Laboratories, pathology records Birth records Presenter’s Name Existing cohorts Date Special interest groups: disease foundations or organizations 37 Incident versus prevalent cases Incident cases: All new cases of disease cases (that become diagnosed) in a certain period Prevalent cases: Presenter’s Name All current cases Date of when the case was diagnosed regardless 38 Incident Vs Prevalence Do the cases represent all incident cases in the target population? Exposure–disease association Vs Exposure–survival association Presenter’s Name Date 39 Prevalence cases Disease only A (causal factor) A+B (protective factor) A+C (protective factor) Patient A: A Presenter’s Name Patient Date B: A+B Patient C: A+C 1-month survival 1-year survival 10-year survival 1 month 1 year 10 years Prevalence cases A,B,C: Causes intervention of B or C ↓↓Survival 40 Disease severity Which stage is chosen for a case? Early stage only Late stage only Progression not always Influence of severity Increase sample size for stratification Presenter’s Name Date 41 Early stage only Finding risk factors of thyroid cancer Decrease risk factors Prevent thyroid cancer Health promotion Case: small thyroid cancer Control: normal population Presenter’s Determined theName differences of exposure Date Small thyroid cancer no progression What is the clinical meaning of this study? 42 Late stage only – difficult diagnosis Pancreatic cancer Vs. Weight Cases: pancreatic cancer (late stage) Low weight due to Cancer progression Conclusion low weight pancreatic cancer Presenter’s Name Date Increase sample size for stratification 43 Selection bias Selection of cases independent of exposure status Related to severity Related to hospitalization or visiting Presenter’s Name Date 44 Example selection bias (1) Hypothesis Common cold Asthma Setting Patients in Hospital Truth Presenter’s Name Common Date cold: aggravating factor not causal factor No different incidence of asthma according to common cold Common cold (+) aggravation hospital visit Common cold (-) no symptoms no visit 45 Example selection bias (2) Total Common cold in society Patients in hospital Common cold in hospital Asthma 1,000 10 50 10 General 200,000 2,000 1000 20 (10+ alpha) Cause positive Cause negative Case (asthma) Date 10 40 Control 1 49 Presenter’s Name Odds ratio = (1X49)/(4X1) 46 Case and Control selection Presenter’s Name Date Same distribution of risk factors ?? 47 Presenter’s Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 48 Selection of controls – basic tenets Same target population of cases Selection needs to be independent of exposure Should have the same proportion of exposed to non-exposed persons as the underlying cohort (source population) Confirmation of lack Presenter’s Name of outcome/disease Should answer yes to: If developed disease of Date interest during study period, would they have been included as a case? 49 Selecting controls – Same as case source Characteristics 1. 2. 3. 4. Convenient Most likely same target population Rule out outcome – avoids misclassification Similar factors leading to inclusion into source population 5. Sometimes impractical Presenter’s Name Examples Date Breast cancer screening program • Confirmed breast cancer – cases • No breast cancer – controls Same hospital as case series • Similar referral pattern – examine by illness types 50 Source for controls Geographic population Roster needed Probability sampling Neighborhood controls Random sample of the neighborhood Presenter’s Name FriendsDate and family members Hospital-based control 51 Selection of controls: Friends or family members Friends or family members Ask each case for list of possible friends who meet eligibility criteria Randomly select among list Type of matching - will be addressed later Concerns: Presenter’s Name May inadvertently select on exposure status, that is, Date friends because of engaging in similar activities or having similar characteristics/culture/tastes “over-matching” 52 Presenter’s Name Date Am J Epidemiol 2004;159:915-21 53 Selection of controls Hospital or clinic-based Strengths Ease and accessibility Avoid recall bias Concerns Section bias: exposure related to the hospitalization A mixturePresenter’s of the best defensible control Name Date Referral pattern Same Or not 54 Diet pattern: Colon cancer 소화기 암 전문 병원 (GI referral center)에서 연구를 수행함 Case : 소화기 클리닉의 대장암 (+) Control : 호흡기 클리닉의 대장암 (-) • 소화기 클리닉: 대기실 소화기 암 관련 음식 정보 • 호흡기 클리닉Name Presenter’s Date차이는 질환의 차이가 아니라 두 군 간에 클리닉의 차이를 반영할 수도 있다. Control :소화기 클리닉의 위암 (+) 55 Presenter’s Name Date Guallar E, et al. N Engl J Med 2002;347:1747-54 56 Weakness of Case-Control Studies Time period from which the cases arose Survival factor, Reverse causation Biologically relevant period Only one outcome measured Susceptibility to bias Presenter’s Name Separate sampling of the cases and controls Date Retrospective measurement of the predictor variables 57 Issues in case-control studies Eliseo Guallar, MD, DrPH eguallar@jhsph.edu Presenter’s Name Date Juhee Cho, M.A., Ph.D. jcho@samsung.com Case and Control selection Presenter’s Name Date Same distribution of risk factors ?? 59 Selection of cases Case selection in hospitals/ Control selection in general population Alcohol Hip fractures: All visit hospitals IUD abortion 1st abortion: Some visit but others not Women with IUD in general population more frequently visit clinics Target population Study sample Presenter’s Name Disease DateNo disease Exposed Non-exposed A B C D Disease No disease Exposed a b Non-exposed c d 60 1st abortion: 3% rate and no relation of IUD General population IUD(+) 1000 30 970 IUD(-) 9000 270 8730 Presenter’s Name IUD: frequent visit Date Hospital population IUD (+) 90% 27 IUD (-) 45% 122 873 3930 case control Yes 10 10 No 90 90 100 100 case control Yes 27 15 No 122 134 149 149 61 Case control Yes 27 15 (%) 18.1 10.1 No 122 134 (%) Presenter’s 81.9 70.0 Name Total Date 149 149 % 100 100 Pearson chi2(1) = 3.9911 Pr=0.046 Total 42 14.1 256 85.9 298 100 62 How to overcome…. Control: general population difference due to frequent visit Control: Hospital population theoretically same unless this control group has higher abortion rates due to Name other Presenter’s problems ControlDate mixture: both 63 Critics from papers Limited cases Selection bias from control selection Presenter’s Name Date To make you paper better than previous studies 64 Presenter’s Name Date 65 Presenter’s Name Date Nomura A, et al. N Engl J Med 1991;325:1132-6 66 Selection bias in nested case-control study Controls were excluded if they had had gastrectomy or history of peptic ulcer disease Controls with a cardiovascular disease or cancer at baseline or during follow-up were excluded Target population Study sample Presenter’s Name Disease No disease Date Disease No disease Exposed A B Exposed a b Nonexposed C D Nonexposed c d 67 Presenter’s Name Date 68 Presenter’s Name Date At GI clinic MacMachon B, et al. N Engl J Med 1981;304:630-3 69 Presenter’s Name Date Exclude other diseases MacMachon B, et al. N Engl J Med 1981;304:630-3 70 Presenter’s Name Date MacMachon B, et al. N Engl J Med 1981;304:630-3 71 Selection bias in case-control study Controls were largely patients with diseases of the gastrointestinal tract Control patients may have reduced their coffee intake as a consequence of GI symptoms Target population Study sample Presenter’s Name Disease No disease Disease No disease Date Exposed A B Exposed a b Nonexposed C D Nonexposed c d 72 Presenter’s Name Date 73 Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 74 Presenter’s Name Date Non-GY Control GY Control Antunes CMF, et al. N Engl J Med 1979;300:9-13 6.0 2.1 75 Criticisms of prior case-control studies Diagnostic surveillance bias Women on estrogens are evaluated more intensively – they are more likely to be diagnosed and to be diagnosed at earlier stages Women with asymptomatic cancer who receive estrogens are more likely to bleed and to be diagnosed Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 76 To avoid selection bias in case-control studies Selection of cases Types of cases selected (non-fatal, symptomatic, advanced) Response rates among cases Relation of selection to exposure – Are exposed cases more (or less) likely to be included in the study? Selection of controls Type of controls (general population, hospital, friends and Presenter’s Name relatives) Date controls, diseases selected as control conditions For hospital Response rate among controls Relation of selection to exposure – Are exposed controls more (or less) likely to be included in the study? Similar response rates in cases and controls do NOT rule out selection bias 77 Presenter’s Name Date 78 Recall issues All information in case-control studies is historic, so if relying on reporting by participants, accuracy depends on recall Concerns: Do cases recall prior events differently from controls? Mindset of someone Presenter’s Name with disease : Is there something that I did that may have caused the disease? Date Recall Bias (Information Bias) 79 Recall bias – example Mothers of babies born with congenital malformations more likely to recall (accurately or “over-recall”) events during pregnancy such as illnesses, diet, etc. Presenter’s Name Date 80 Presenter’s Name Date 81 Folic acid and neural tube defects Figure 1: Features of neural tube development and neural tube defects. Botto et el. Neural tube defects. NEJM 1999. (28th days after fertilization) Background and Aim A reduced recurrent risk of neural tube defects among women receiving muti-vitamin supplements containing folic acid. Most of NTDs are de-novo; less than 10% of NTDs are recurrent. First occurrence of only NTDs and periconceptional folate supplements Study population Pregnant women Target Source Study Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Overall data Folate (+) OR = 0.6 (0.4 – 0.8) 85 Recall Bias: Previous knowledge 86 Recall Bias quantification Case Control OR In this study 1000 1000 real 500 800 0.625 Control – 75% all 400 600 0.667 Case – 80% 0.6 Prev known 450 600 0.750 Case – 90% 0.8 Prev unknown 375 600 0.625 Case – 75% 0.4 Recall rate 87 Recall bias – assessment / avoidance Check with recorded information, if possible Use objective markers or surrogates for exposure – careful of markers that are affected by disease Ask participant to identify which factor(s) are Presenter’s Name important for disease Date Build in false risk factor to test for overreporting Use controls with another disease 88 Study population Pregnant women Target Source Study Case NTDs Control Other major malformations due to recall bias Subjects with oral clefts were excluded because vitamin supplementation has been hypothesized to reduce the risk: selection bias Selection bias If oral clefts were included in control group, control with exposure (lack of vitamin supplement or folate intake) increased. As B number increases, the probability of rejecting null hypothesis decreases. Cleft = ↓intake of vitamin Case Control Exposure (+) A B Exposrue (-) C D Exposure: lack of folate intake Methods Periconceptional folic acid exposure was determined by Interview with study nurses Demographic Health behavior factors Reproductive history Family history of birth defects Occupation Illnesses (chronic and during pregnancy) Use of alcohol, cigarettes and medications Vitamin use during the 6 months before the last LMP through the end of pregnancy Semi-quantitative food frequency questionnaire Knowledge of vitamins and birth defects Confounding Exposure ↓ Folate intake Confounding Alcohol Outcome ↑ NTDs Interviewer bias Differential interviewing of cases and controls, i.e., may probe or interpret responses differently Presenter’s Name Date Interviewer Bias (Information Bias) 93 Interviewer bias – avoidance / assessment Self-administered instruments (prone to more non-response) Standardized instruments Computerized instruments (CADI, ACASI) Avoid open-ended questions but rather use Name possible response elicited questionsPresenter’s with each Date Training Masking interviewers to research question Masking interviewers to case/control status Same interviewers for cases and controls 94 Odds ratio Disease Exposed Yes No Yes No A1 B1 A B0 Presenter’s Name 0 Date A1 B0 OR (cross product ratio) A0 B1 Example: CHD and Diabetes CHD Yes Diabetes No Yes 183 65 No 575 735 Presenter’s Name Date 183 / 65 ORCHD 3.62 575 / 735 No units! 96 Some properties of odds ratios Null value: OR = 1 OR >= 0 (cannot be negative) Multiplicative scale (be careful with plots) Use logistic regression to estimate multivariate adjusted odds ratios in casePresenter’s Name control Date studies 97 Odds ratios and the “rare disease assumption” With incidence density sampling (represents underlying cohort at time of case) and sampling of cases and controls independent of exposure: OR ≈ IR With outcomes of very low incidence in the underlyingPresenter’s cohortName and sampling of cases and Date controls independent of exposure: OR ≈ RR Higher incidence increases the bias away from the null 98 Presenter’s Name Date 99 Matching Individual matching Up to 1:5 Frequency matching Case selection confounder frequency matching Presenter’s Name Stratified sampling Date Stratification selection of case and control 100 Odds ratio – matched pairs Case Control # pairs A1 B1 n11 A1 B0 n10 A0 Presenter’s Name B1 n01 B0 n00 Date A0 N = total # pairs N pairs = N cases and N controls 2 N people 101 Presenter’s Name Date Antunes CMF, et al. N Engl J Med 1979;300:9-13 102 Matching Cannot examine the independent effect of matched variable on outcome May inadvertently match On the exposure itself or its surrogate On a factor in the causal pathway Presenter’s Name On a factor that is affected by the outcome Date Logistical complexity of matching Particularly useful when distribution of confounders is very different in cases and controls 103 Designing a case-control study Overview I What is the research question? In what target population? What source(s) will be used? How long will recruitment take? What is the definition of the cases? What confirmation is needed? Is screening/additional Presenter’s Name testing necessary? Date Will prevalent cases be used? Does exposure influence the disease prognosis? What is the underlying cohort? How many cases are seen per year in the source? 104 Designing a case-control study Overview II What are the eligibility criteria for controls? What source(s) will be used to identify controls? Do they represent the same underlying cohort as the cases? What confirmation is needed? Is screening/additional testing necessary? Sampling methods? Will the controls be selected Presenter’s Name throughout the study period? Can they be selected as Date cases if they later develop disease? Do additional sources need to be used? For both cases and controls, does exposure status affect: inclusion in source populations or participation? 105 Designing a case-control study Overview III Are there known confounders? Should matching be used? What methods will be used to recruit cases and controls? What methods will be used to obtain information about exposures and potential confounders? Active / Passive? Are the methods of data collection objective and Name independent Presenter’s of case/control status? Date What methods are in-place to avert and monitor differential recall by case/control status if interviewing is involved? If study involves personnel-administered data collection, are the personnel masked to case-control status? 106 Summary What is the study question? Appropriate Duration of recruitment Definition of cases Prevalence case Presenter’s Name Eligibility of controls Date Represent the target population Another sources 107