Review of observational study design and basic statistics for contingency tables Coffee Chronicles BY MELISSA AUGUST, ANN MARIE BONARDI, VAL CASTRONOVO, MATTHEW JOE'S BLOWS Last week researchers reported that coffee might help prevent Parkinson's disease. So is the caffeine bean good for you or not? Over the years, studies haven't exactly been clear: According to scientists, too much coffee may cause... 1986 --phobias, --panic attacks 1990 --heart attacks, --stress, --osteoporosis 1991 -underweight babies, --hypertension 1992 --higher cholesterol 1993, 08 --miscarriages 1994 --intensified stress 1995 --delayed conception But scientists say coffee also may help prevent... 1988 --asthma 1990 --colon and rectal cancer,... 2004—Type II Diabetes (*6 cups per day!) 2006—alcohol-induced liver damage 2007—skin cancer Medical Studies The General Idea… Evaluate whether a risk factor (or preventative factor) increases (or decreases) your risk for an outcome (usually disease, death or intermediary to disease). ? Exposure Disease Observational vs. Experimental Studies Observational studies – the population is observed without any interference by the investigator Experimental studies – the investigator tries to control the environment in which the hypothesis is tested (the randomized, double-blind clinical trial is the gold standard) Limitation of observational research: confounding Confounding: risk factors don’t happen in isolation, except in a controlled experiment. – Example: In a case-control study of a salmonella outbreak, tomatoes were identified as the source of the infection. But the association was spurious. Tomatoes are often eaten with serrano and jalapeno peppers, which turned out to be the true source of infection. – Example: Breastfeeding has been linked to higher IQ in infants, but the association could be due to confounding by socioeconomic status. Women who breastfeed tend to be better educated and have better prenatal care, which may explain the higher IQ in their infants. Confounding: A major problem for observational studies ? Exposure Disease Confounder Why Observational Studies? Cheaper Faster Can examine long-term effects Hypothesis-generating Sometimes, experimental studies are not ethical (e.g., randomizing subjects to smoke) Possible Observational Study Designs Cross-sectional studies Cohort studies Case-control studies Cross-Sectional (Prevalence) Studies Measure disease and exposure on a random sample of the population of interest. Are they associated? Marginal probabilities of exposure AND disease are valid, but only measures association at a single time point. The 2x2 Table Exposure (E) Disease (D) a No Exposure (~E) b No Disease (~D) c d (a+c)/T = P(E) (b+d)/T = P(~E) Marginal probability of exposure Marginal probability of disease (a+b)/T = P(D) (c+d)/T = P(~D) N Example: cross-sectional study Relationship between atherosclerosis and late-life depression (Tiemeier et al. Arch Gen Psychiatry, 2004). Methods: Researchers measured the prevalence of coronary artery calcification (atherosclerosis) and the prevalence of depressive symptoms in a large cohort of elderly men and women in Rotterdam (n=1920). Example: cross-sectional study P(“D”)= Prevalence of depression (sub-thresshold or depressive disorder) (20+13+12+9+11+16)/1920 = 4.2% P(“E”)= Prevalence of atherosclerosis (coronary calcification >500): (511+12+16)/1920 = 28.1% The 2x2 table: Coronary calc >500 Coronary calc <=500 Any depression None 28 511 53 1328 81 1839 539 1381 1920 P(depression)= 81/1920 = 4.2% P(atherosclerosis) = 539/1920 = 28.1% P(depression/atherosclerosis) = 28/539 = 5.2% Difference of proportions Z-test: Coronary calc >500 Coronary calc <=500 Any depression None 28 511 53 1328 81 p depression/ atherosclerosis Z difference s.e.(difference) 1839 539 1381 1920 28 53 .052; p depression/ unblocked .038 539 1381 .052 .038 .014 1.33; p .18 (.042)(1 .042) (.042)(1 .042) .0101 539 1381 Or, use relative risk (risk ratio): Coronary calc >500 Coronary calc <=500 Any depression None 28 511 53 1328 81 1839 539 1381 1920 .052 RR 1.37; 95% CI (0.86, 2.19) .038 Interpretation: those with coronary calcification are 37% more likely to have depression (not significant). Or, use chi-square test: Observed: Coronary calc >500 Coronary calc <=500 Any depression None 28 511 539 53 1328 1381 81 Expected: Coronary calc >500 Coronary calc <=500 1839 1920 Any depression None 539*81/1920= 539-22.7= 22.7 516.3 81-22.7= 1381-58.3= 58.3 1322.7 81 1839 539 1381 Chi-square test: (observed - expected) 2 expected 2 2 2 2 ( 28 22 . 7 ) ( 53 58 . 3 ) ( 511 516 . 3 ) 2 1 22.7 58.3 516.3 2 (1328 1322.7) 1.77 1322.7 p .18 Note: 1.77 = 1.332 Chi-square test also works for bigger contingency tables (RxC): Chi-square test also works for bigger contingency tables (RxC): No depression Subthreshhold depressive symptoms Clinical depressive disorder 0-100 865 20 9 101-500 463 13 11 >500 511 12 16 Coronary calcification Observed: Coronary calcificati on 0-100 101-500 >500 Expected: No depression Subthreshhold depressive symptoms Clinical depressive disorder 865 20 9 Coronary calcification 894 463 13 11 487 511 12 16 539 0-100 36 1920 894*1839 849*45/1 /1920= 920= 856.3 101-500 >500 1839 45 No depression Subthreshhold depressive symptoms 21 487*1839 487*45/1 /1920= 920= Clinical depressive disorder 894(21+856. 3)=16.7 487(466.5+1 1.4)=9.1 466.5 11.4 1839(856.3+4 66.5)= 516.2 4536(21+11.4) (16.7+9.1 = )= 12.6 10.2 Chi-square test: (observed - expected) 2 expected 2 2 2 2 ( 865 856 . 3 ) ( 20 21 ) ( 9 16 . 7 ) 42 856.3 21 16.7 (463 466.5) 2 (13 11.4) 2 (11 9.1) 2 466.5 11.4 9.1 2 2 2 (511 516.2) (12 12.6) (16 10.2) 7.877 516.2 12.6 10.2 p .096 Cause and effect? ? Biological changes atheroscleros is ? Lack of exercise Poor Eating depression in elderly Confounding? ? Biological changes atheroscleros is ? Lack of exercise Poor Eating Advancing Age depression in elderly Cross-Sectional Studies Advantages: – cheap and easy – generalizable – good for characteristics that (generally) don’t change like genes or gender Disadvantages – difficult to determine cause and effect – problematic for rare diseases and exposures 2. Cohort studies: Sample on exposure status and track disease development (for rare exposures) Marginal probabilities (and rates) of developing disease for exposure groups are valid. Example: The Framingham Heart Study The Framingham Heart Study was established in 1948, when 5209 residents of Framingham, Mass, aged 28 to 62 years, were enrolled in a prospective epidemiologic cohort study. Health and lifestyle factors were measured (blood pressure, weight, exercise, etc.). Interim cardiovascular events were ascertained from medical histories, physical examinations, ECGs, and review of interim medical record. Example 2: Johns Hopkins Precursors Study (medical students 1948 through 1964) http://www.jhu.edu/~jhumag/0601web/study.html From the John Hopkin’s Magazine website (URL above). Cohort Studies Disease Exposed Target population Disease-free cohort Disease-free Disease Not Exposed Disease-free TIME The Risk Ratio, or Relative Risk (RR) Exposure (E) Disease (D) a No Exposure (~E) b No Disease (~D) c d a+c b+d risk to the exposed RR P(D / E) P(D /~E ) a /( ac) b /(bd ) risk to the unexposed Hypothetical Data Congestive Heart Failure No CHF High Systolic BP Normal BP 400 400 1100 2600 1500 3000 400 / 1500 RR 2.0 400 / 3000 Advantages/Limitations: Cohort Studies Advantages: – Allows you to measure true rates and risks of disease for the exposed and the unexposed groups. – Temporality is correct (easier to infer cause and effect). – Can be used to study multiple outcomes. – Prevents bias in the ascertainment of exposure that may occur after a person develops a disease. Disadvantages: – Can be lengthy and costly! 60 years for Framingham. – Loss to follow-up is a problem (especially if non- random). – Selection Bias: Participation may be associated with exposure status for some exposures Case-Control Studies Sample on disease status and ask retrospectively about exposures (for rare diseases) Marginal probabilities of exposure for cases and controls are valid. • Doesn’t require knowledge of the absolute risks of disease • For rare diseases, can approximate relative risk Case-Control Studies Exposed in past Disease (Cases) Target population Not exposed Exposed No Disease (Controls) Not Exposed Example: the AIDS epidemic in the early 1980’s Early, case-control studies among AIDS cases and matched controls indicated that AIDS was transmitted by sexual contact or blood products. In 1982, an early case-control study matched AIDS cases to controls and found a positive association between amyl nitrites (“poppers”) and AIDS; odds ratio of 8.6 (Marmor et al. 1982). This is an example of confounding. Case-Control Studies in History In 1843, Guy compared occupations of men with pulmonary consumption to those of men with other diseases (Lilienfeld and Lilienfeld 1979). Case-control studies identified associations between lip cancer and pipe smoking (Broders 1920), breast cancer and reproductive history (Lane-Claypon 1926) and between oral cancer and pipe smoking (Lombard and Doering 1928). All rare diseases. Case-control studies identified an association between smoking and lung cancer in the 1950’s. Case-control example A study of the relation between body mass index and the incidence of age-related macular degeneration (Moeini et al. Br. J. Ophthalmol, 2005). Methods: Researchers compared 50 Iranian patients with confirmed age-related macular degeneration and 80 control subjects with respect to BMI, smoking habits, hypertension, and diabetes. The researchers were specifically interested in the relationship of BMI to age-related macular degeneration. Results Table 2 Comparison of body mass index (BMI) in case and control groups Lean BMI <20 Normal 20 BMI <25 Case n = 50(%) Control n = 80 (%) p Value 7 (14) 6 (7.5) NS 16 (32) 20 (25) NS Overweight 25 BMI <30 21 (42) Obese BMI 30 6 (12) NS, not significant. 36 (45) 18 (22.5) NS NS Corresponding 2x2 Table Overweight Normal ARMD 27 23 50 No ARMD 54 26 80 What is the risk ratio here? Tricky: There is no risk ratio, because we cannot calculate the risk of disease!! The odds ratio… We cannot calculate a risk ratio from a case-control study. BUT, we can calculate a measure called the odds ratio… Odds vs. Risk If the risk is… ½ (50%) ¾ (75%) 1/10 (10%) 1/100 (1%) Then the odds are… 1:1 3:1 1:9 1:99 Note: An odds is always higher than its corresponding probability, unless the probability is 100%. The Odds Ratio (OR) Exposure (E) Disease (D) a No Disease (~D) c No Exposure (~E) b d c+d=controls a /( a b) a ad b /( a b) b c /( c d ) c bc d /( c d ) d Odds of exposure in the cases The proportion of cases and P ( E /controls D ) are set by the therefore, they P (~ Einvestigator; / D) do not represent the risk P ( E /(probability) ~ D) of developing P (~ E /disease. ~ D) Odds of exposure in the controls OR a+b=cases The Odds Ratio (OR) Exposure (E) Disease (D) a No Disease (~D) c No Exposure (~E) b d Odds of exposure for the cases. OR a b c d ad bc a c b d Odds of exposure for the controls Odds of disease for the exposed Odds of disease for the unexposed Proof via Bayes’ Rule (optional) P( E / D) P(~ E / D) P( E / ~ D) P(~ E / ~ D) Odds of exposure in the cases Odds of exposure in the controls Bayes’ Rule P( D / E ) P( E ) P( D) P ( D / ~ E ) P (~ E ) P( D) P (~ D / E ) P ( E ) P (~ D ) P (~ D / ~ E ) P (~ E ) P (~ D ) P( D / E ) P(~ D / E ) P( D / ~ E ) P(~ D / ~ E ) = Odds of disease in the exposed What we want! Odds of disease in the unexposed The Odds Ratio (OR) Overweight Normal ARMD a b No ARMD c d Odds of overweight for the cases. OR a b c d ad bc a c b d Odds of overweight for the controls Odds of ARMD for the overweight Odds of ARMD for the normal weight The Odds Ratio (OR) Overweight Normal ARMD 27 23 No ARMD 54 26 OR 27 23 54 26 27 * 26 .57 23 * 54 The Odds Ratio (OR) Overweight Normal ARMD 27 23 No ARMD 54 26 OR 27 23 54 26 27 * 26 .57 23 * 54 Can be interpreted as: Overweight people have a 43% decrease in their ODDS of age-related macular degeneration. (not statistically significant here) The odds ratio is a good approximation of the risk ratio if the disease is rare. If the disease is rare (affecting <10% of the population), then: OR RR WHY? If the disease is rare, the probability of it NOT happening is close to 1, and the odds is close to the risk. Eg: OR 1 / 19 .474 1/ 9 1 / 20 RR .50 1 : 10 The rare disease assumption OR P( D / E ) P (~ D / E ) 1 P( D / ~ E ) P (~ D / ~ E ) P( D / E ) P( D / ~ E ) RR 1 When a disease is rare: P(~D) = 1 - P(D) 1 The odds ratio vs. the risk ratio Rare Outcome Odds ratio Odds ratio Risk ratio 1.0 (null) Risk ratio Common Outcome Odds ratio Odds ratio Risk ratio 1.0 (null) Risk ratio When is the OR is a good approximation of the RR? General Rule of Thumb: “OR is a good approximation as long as the probability of the outcome in the unexposed is less than 10%” Prevalence of age-related macular degeneration is about 6.5% in people over 40 in the US (according to a 2011 estimate). So, the OR is a reasonable approximation of the RR. Advantages/Limitations: Case-control studies Advantages: – Cheap and fast – Efficient for rare diseases Disadvantages: – Getting comparable controls is often tricky – Temporality is a problem (did risk factor cause disease or disease cause risk factor? – Recall bias Inferences about the odds ratio… Properties of the OR (simulation) (50 cases/50 controls/20% exposed) If the Odds Ratio=1.0 then with 50 cases and 50 controls, of whom 20% are exposed, this is the expected variability of the sample ORnote the right skew Properties of the lnOR Standard deviation = 1 1 1 1 a b c d Hypothetical Data Amyl Nitrite Use AIDS 20 No Amyl Nitrite 10 Does not have AIDS 6 24 (20)( 24) OR 8.0 (6)(10) 95% CI (8.0)e 1.96 1 1 1 1 20 6 10 24 1.96 , (8.0)e 1 1 1 1 20 6 10 24 30 30 Note that the size of the smallest 2x2 cell determines the magnitude of the variance (2.47 - 25.8) When can the OR mislead? Example: Does dementia predict death? Dementia: The leading predictor of death in a defined elderly population. Neurology 2004; 62: 1156-1162 Among patients with dementia: 291/355 (82%) died Among patients without dementia: 947/4328 (22%) died Dementia study Authors report OR = 16.23 (12.27, 21.48) But the RR = 3.72 Fortunately, they do not dwell on the OR, but it could mislead if not interpreted correctly… Better to give OR or RR? From an RCT (prospective!) of a new diet drug, the authors showed the following table: Odds Ratios for losing at least 5kg were: 4.0 (low dose vs. placebo) 20.9 (medium dose vs. placebo) 31.5 (high dose vs. placebo) Better to give OR or RR? Corresponding RRs are: 59%/29%=2 (low dose vs. placebo) 87%/29%=3 (medium dose vs. placebo) 91%/29%=3 (high dose vs. placebo) Summary of statistical tests for contingency tables Table Size Test or measures of association 2x2 risk ratio (cohort or cross-sectional studies) odds ratio (case-control studies) Chi-square difference in proportions Fisher’s Exact test (cell size less than 5) RxC Chi-square Fisher’s Exact test (expected cell size <5) Fisher’s Exact Test Fisher’s “Tea-tasting experiment” Claim: Fisher’s colleague (call her “Cathy”) claimed that, when drinking tea, she could distinguish whether milk or tea was added to the cup first. To test her claim, Fisher designed an experiment in which she tasted 8 cups of tea (4 cups had milk poured first, 4 had tea poured first). Null hypothesis: Cathy’s guessing abilities are no better than chance. Alternatives hypotheses: Right-tail: She guesses right more than expected by chance. Left-tail: She guesses wrong more than expected by chance Fisher’s “Tea-tasting experiment” Experimental Results: Guess poured first Milk Tea Milk 3 1 4 Tea 1 3 4 Poured First Fisher’s Exact Test Step 1: Identify tables that are as extreme or more extreme than what actually happened: Here she identified 3 out of 4 of the milk-poured-first teas correctly. Is that good luck or real talent? The only way she could have done better is if she identified 4 of 4 correct. Guess poured first Milk Tea Poured First Milk 3 1 4 Tea 1 3 4 Guess poured first Milk Tea Milk 4 0 Tea 0 4 Poured First 4 4 Fisher’s Exact Test Step 2: Calculate the probability of the tables (assuming fixed marginals) Guess poured first Milk Tea Milk 3 1 Tea 1 3 Poured First 4 4 P (3) .229 4 3 4 1 8 4 Guess poured first Milk Tea Milk 4 0 Tea 0 4 Poured First 4 4 P ( 4) .014 4 4 4 0 8 4 Step 3: to get the left tail and right-tail p-values, consider the probability mass function: Probability mass function of X, where X= the number of correct identifications of the cups with milk-poured-first: P ( 4) .014 P (3) .229 P ( 2) .514 P (1) .229 .014 P ( 0) 4 4 4 0 8 4 4 3 4 1 8 4 4 2 4 2 8 4 4 1 4 3 8 4 4 0 4 4 8 4 “right-hand tail probability”: p=.243 SAS also gives a “two-sided p-value” which is calculated by adding up all probabilities in the distribution that are less than or equal to “left-hand tail probability” the probability of (testing the the observed table alternative (“equal or more hypothesis thatextreme”). Here: she’s 0.229+.014+.0.229+ systematically .014= .4857 Summary of statistical tests for contingency tables Table Size Test or measures of association 2x2 risk ratio (cohort or cross-sectional study) odds ratio (case-control study) Chi-square difference in proportions Fisher’s Exact test (cell size less than 5) RxC Chi-square Fisher’s Exact test (expected cell size <5)