Biostatistics course Part 13 Effect measures in 2 x 2 tables Dr. Sc. Nicolas Padilla Raygoza Department of Nursing and Obstetrics Division Health Sciences and Engineering University of Guanajuato Campus Celaya-Salvatierra Biosketch Medical Doctor by University Autonomous of Guadalajara. Pediatrician by the Mexican Council of Certification on Pediatrics. Postgraduate Diploma on Epidemiology, London School of Hygiene and Tropical Medicine, University of London. Master Sciences with aim in Epidemiology, Atlantic International University. Doctorate Sciences with aim in Epidemiology, Atlantic International University. Associated Professor B, Department of Nursing and Obstetrics, Division of Health Sciences and Engineering, University of Guanajuato, Campus Celaya Salvatierra, Mexico. padillawarm@gmail.com Competencies The reader will obtain Risk Ratio or Odds Ratio from a 2 x 2 table. He (she) will calculate 95% confidence interval from RR or OR. He (she) will identify potential confounders and/or interactions. He (she) will apply Mantel Haenzsel test for RR, OR and Chi-squared. Introduction In part 12 of the course, we tested the association between two categorical variables. Now, we review the methods used to measure the association. We will work with binary variables, so we will use 2 x 2 tables. Example A nurse in a poor area of Mexico, was informed that many area children attending the nursery were sick of respiratory infections. She designed a cohort study to investigate the problem. During the following years 1000 children were followed. The main research question was: Attending nursery is associated with respiratory infection? Example Attending nursery Respiratory infection Respiratory infection Yes No n % n Yes 37 33.9 72 No 43 4.8 848 Total 80 8 920 Total % 66.1 109 95.2 891 92 1000 Risk Ratio (RR) In health research, the term "risk" is used instead of proportion. For example: The risk of infection among children attending day care was 33.9%. Thus, the risk ratio is the ratio of two proportions. The risk of respiratory infection for those attending the nursery 37 / (37 + 72) = 37/109 = 0.339 The risk of respiratory infection in children not attending day care is: 43 / (43 + 848) = 43/891 = 0.048. The risk ratio (RR) is the ratio of these two risks. Risk ratio = 0.339 / 0.048 = 7.06 Risk Ratio (RR) In general, the risk ratio can be obtained with the following formula, where a, b, c and d are the frequencies in the 2 x 2 table. Outcome Outcome Yes No Yes a b a+b No c d c+d a+c b+d Exposure Total Risk Ratio = (a /a+b) / (c/c + d) Total N Odds Ratio (OR) The Odds Ratio (OR) is the ratio of the chance (probability) of the results between those exposed and the chance of the outcome among non-exposed. The chance of infection among attendees of the nursery is: 37 / 72 = 0,514 The chance of infection among children not attending day care is: 43 / 848 = 0,051 The Odds Ratio of these two probabilities: OR = 0,514 / 0,051 = 10.08 In general, the Odds Ratio was found with the following formula: OR = ad / bc = (a / c) / (b / d) Confidence intervals In the analysis of data from children attending day care or not, we have the option to use RR or OR, to measure the effect of attendance at the nursery. Each value is an estimate only, so these values should be reported with confidence intervals. An approximate confidence interval at 95% for the RR is found using the following formula: Minimum value: RR / EF Maximum value: RR x EF EF = exp(1.96√(1/a) – (1/a+b) + (1/c) –(1/c+d)) Confidence intervals CI for the data of children who attend day care or not, is: EF = exp (1.96 √ 1 / 37 - 1 / 109 + 1 / 43 1/891 = 1.48 RR = 7.06 Minimum 7.06/1.48 = 4.77 Maximum value 7.06 x 1.48 = 10.45 95% CI = 4.77 to 10.45 Confidence intervals An approximate confidence interval at 95% for the OR is found using the following formula: Minimum value: OR / EF Maximum value: OR x EF EF = exp(1.96√(1/a) + (1/b) + (1/c) + (1/d)) Confidence intervals CI for the data of children who attend day care or not, is: EF = exp (1.96 √ 1 / 37 + 1 / 72 + 1 / 43 +1 / 848 = 1.65 OR = 10.08 Minimum value 10.08/1.65 = 6.11 Maximum value 10.08 x 1.65 = 16.63 95% CI = 6.11 to 16.63 Which measure is best? Risk Ratios are calculated for cross-sectional and cohort studies. The formula for the 95% confidence interval for RR requires larger sample sizes than for OR. OR are calculated for case-control and cross- sectional studies. In case-control studies is not possible to calculate risks, and therefore can not calculate RR. There is an advantage in using OR. It is a consistent measure of effect, unlike RR. Example (Cont…) Mexican children showed a strong association between exposure (attending nursery) and outcome (respiratory infection). However such an association may be confounded by other factor(s). For example, although children who attend day care, seem to have a 7 times higher risk of respiratory infection, the cause of the infection can also be something that is associated with children who go to daycare. In other words, to attend the nursery may be a marker of exposure that causes a respiratory infection. If this is true, we can say that the association between respiratory infections and assistance to the nursery, are confused. How identify a potential confounder? To evaluate a potential confounder, we should consider three aspects: The exposure The outcome The confounder Example The nurse is interested in the association between day care attendance and presence of respiratory infection, but is aware that children might be exposed to other factors that cause respiratory infection. For example, overcrowding at home is a risk factor for respiratory infection. It is therefore a potential confounder of the association between attendance at day care and respiratory infections. Confounders For a variable has been a potential confounding, it should meet three conditions: Must be: an independent risk factor for the outcome of interest should be associated with the exposure of interest not be in the cause pathway between exposure and outcome. Confounders How do we check these conditions in the study of Mexican children? Condition 1 of confusion: Risk factor for the outcome of interest Is there an association between overcrowding and respiratory infection? Overcrowding in home RI Yes RI No Risk of RI Yes 54 55 54/109 =0.5 RR = 25 95%CI = 15.72 a 39.75 X2= 311.67 No 21 870 21/891= 0.02 P<<0.05 Confounders How do we check these conditions in the study of Mexican children? Condition 2 of confusion: Association with exposure Is there an association between overcrowding and assistance to child care? Overcrowding in home Attendance to Attendance to nursery nursery Yes No Yes 43 66 No 35 856 X2= 170.39 P<<0.05 Confounders How do we check these conditions in the study of Mexican children? Condition 3 of confusion: Is the potential confusion is the causal pathway? In this example, it is unlikely that child care assistance, is caused by overcrowding Do we have a confounder? In this study, overcrowding has satisfied the three conditions necessary for a confounding variable: It is an independent risk factor for the outcome of interest. Overcrowding is associated with respiratory infection. It is associated with the exposure of interest. Overcrowding is associated with attendance at the nursery. It is not in the causal pathway. Overcrowding is unlikely to be the cause of attendance at nursery. Stratified tables Now, we know that the data must be additionaly analyzed for to have the effect of overcrowding. To adjust for confounder variable, we stratified the table 2 x 2 of interest. The table without stratify is called raw table. Can be divided into strata defined by the confounder variable. The sample is divided into two groups, each of them the status of overcrowding is the same. The two groups are: Overcrowding and without overcrowding Stratified tables If we want to find childcare assistance is associated with respiratory infection when comparing children within the same category of overcrowding. The raw table for the relationship between respiratory infections and child care assistance: Attendance to nursery Respiratory infection Respiratory infection Yes No n % n Yes 37 33.9 72 No 43 4.8 848 Total 80 8 920 Total % 66.1 109 95.2 891 92 1000 Stratified tables Now, it is show stratified tables by overcrowding and without overcrowding: Overcrowding Without overcrowding Respiratory infection Yes Respiratory infection No Total Nursery Yes 10 24 34 26 Nursery No 4 861 865 101 Total 14 885 899 Respiratory infection Yes Respiratory infection No Total Nursery Yes 61 14 75 Nursery No 5 21 Total 66 35 RR= 4.23 X2=32.88 p=0.0000 95%CI 1.91 a 9.37 RR= 63.6 X2=178.84 p=0.0000 95%CI 21.01 a 192.56 Stratified tables Do you think that attendance at nursery is a risk factor for respiratory infections among children with overcrowding? Yes, children attending day care are 63 times more at risk of respiratory infection than those who do not attend nursery. The p value indicates a strong association between attendance at daycare and respiratory infection in the group without overcrowding. Stratified tables Do you think that attendance at nursery is a risk factor for respiratory infection in the group without overcrowding? Yes, children attending day care are more than 3 times more at risk of respiratory infection than those not attending the nursery. The p value indicates a strong association between attendance at daycare and respiratory infection in this group. Within each stratum, the association between attendance at day care and respiratory infections is now independent of overcrowding at home. Comparison of results How to compare these results with those of the raw table? The raw table shows a strong relationship between attendance at day care and respiratory infection, RR is different in both tables stratified but remains a significant statistical association. RR 95%CI X2 P-value Raw 7.06 4.77 a 10.45 111.88 <0.05 Overcrowding 4.23 1.91 a 9.37 32.88 <0.05 Without overcrowding 63.6 21.01 a 192.56 178.84 <0.05 Adjusted Risk Ratios Nurse do not want show data divided into strata, prefer a global estimate of the effect of attended to nursery in respiratory tract infection adjusted by overcrowding. This can be done by calculate RR using a Mantel Haenzsel method. First, look 2 x s table in each strata. Exposure Disease Yes Diasease No Yes ae be No ce de Total Total ne Risk Ratios from Mantel Haenzsel Adjusted RR (summarized), can be obtained with: RRMantel Haenzsel Ʃ a (c+d)/n = --------------Ʃ c (a+b)/n This give us a average of RR initially estimate into each table ; more important each table with more sample size. Adjusted Risk Ratio We calculate overcrowding adjusted RR with Mantel Haenzsel formula: Overcrowding Non-overcrowding Respiratory infection Yes Respiratory infection No Total Nursery Yes 61 14 75 Nursery No 5 21 Total 66 35 Respiratory infection Yes Respiratory infection No Total Nursery Yes 10 24 34 26 Nursery No 4 861 865 101 Total 14 885 899 61 (5 + 21)/ 101 + 10 (4 + 861)/899 15.70 + 9.62 25.32 ------------------------------------------------ = ----------------- = ----------- = 6.56 5 (61 + 14)/101 + 4 (10 + 24)/899 3.71 + 0.15 3.86 Adjusted Odds Ratio Adjusted OR is calculate in similar form that adjusted RR. Ʃ ad/n RMMantel Haenzel= ----------Ʃ bc/n Exposure Disease Yes Diasease No Yes ae be No ce de Total Total ne Adjusted Odds Ratio In a cross-sectional study, on the use of quinfamide after a amoebic dysentery, it was reported how many are carriers of Entamoeba histolytic. Non-carrier Carrier Total Quinfamide 100 54 154 Non quinfamide 15 72 87 Total 115 126 241 Adjusted Odds Ratio We calculate adjusted OR by residence area, with the Mantel Haenzsel formula: Urban Rural Non-carrier Carrier Total Quinfamide Yes 35 39 74 Quinfamide No 10 51 61 Total 45 90 Non-carrier Carrier Total Quinfamide Yes 65 14 79 Quinfamide No 5 21 26 Total 70 35 105 135 (35 x 51 /135) + (65 x 21/105) 13.2 + 13 26.2 ---------------------------------------- = ----------------- = ---------- = 7.4 (39 x 10 / 135) + (14 x 5 /105) 2.89 +0.67 3.56 Mantel Haenzsel X2 The nurse now knows that the association between respiratory infection and attend to nursery still is after adjusted by overcrowding, confounder variable. Now, she want to calculate a Chi squared test to significance of this association, adjusted by confounder. This can be do, calculating X2Mantel-Haenzsel test. Mantel Haenzsel X2 To calculate adjusted Chi squared test for the confounder, we calculate Mantel Haenzsel Chi squared. Null hypothesis is that there is not association between attend to nursery and respiratory infection. Ho : OR = 1. [Ʃae-ƩE(ae)]2 X2Mantel Haenzsel= ------------------ƩV(ae) Mantel Haenzsel X2 We should go, step by step, beginning with 2 x 2 of each strata. Exposure Disease Yes Disease No Total Yes ae be No ce de Total ne Mantel Haenzsel X2 Mantel Haenzsel Chi squared test is an average of individuals Chi squared of each table. To calculate Mantel Haenzsel Chi squared test, we need three values of each table: ae number of ill and exposed E(ae) value expected of ae V(ae) variance (standard error squared) of ae, where, E(ae) = total row x total column / grand total = (ae + be) x (ae + ce)/ne (ae + be) x (ce + de) x (ae + ce) x (be + de) V(ae) = -------------------------------------------------------ne²(ne - 1) Example Overcrowding table a = 61 E(a) = 75 x 66 / 101 = 49.01 V(a) = (75 x 66 x 26 x 35) / (101² x (101 - 1)) = 4.42 Non-overcrowding table a = 10 E(a) = 34 x 14 / 899 = 0.53 V(a) = 34 x 14 x 865 x 885 / (899² x (899 - 1)) = 0.50 To obtain Mantel Haenzsel Chi squared test (adjusted Chi squared by overcrowding), we add these values from the two strata, using the formula: [Ʃae-ƩE(ae)]2 X2Mantel Haenzsel= ------------------ƩV(ae) Example To obtain Mantel Haenzsel Chi squared test (Adjusted Chi squared test by overcrowding), we add these values, using the formula: Overcrowding Non-overcrowding Total a 61 10 71 E(a) 49.01 0.53 49.54 V(a) 4.42 0.50 4.92 X2Mantel-Haenzsel = (71 – 49.54)²/4.92= 93.60 Confusion or not confusion How we decide if there is confusion? There are nor statistical tests to demonstrate confusion. We do calculate statistical tests and measure the effect raw and stratified tables. Then, we calculate summarized statistical test and we compare them with the raws, and we conclude if there is confusion or not. Confusion or not confusion If there is an important difference between raw and adjusted estimates, we say that the association of interest is confounding by another factor. We look the data of children that attend to nursery and respiratory infection. After adjust by overcrowding, RR diminish from 7.06 to 6.56. Posibles effects from confusion Generally there are more than one confounder. They can have different effects: The association in study, can be or not significative before of adjust for a confounder and not significative after. The association can be significative after adjust for a confounder but with a p-value less significative. Strata can show oposite results and in this case, it is better, show stratified results. This is interaction or effect modified. Confounder can hide an existing relationship. Bibliografía 1.- Last JM. A dictionary of epidemiology. New York, 4ª ed. Oxford University Press, 2001:173. 2.- Kirkwood BR. Essentials of medical ststistics. Oxford, Blackwell Science, 1988: 14. 3.- Altman DG. Practical statistics for medical research. Boca Ratón, Chapman & Hall/ CRC; 1991: 1-9.