2: Continuing with unadjusted Effect Estimates

Applied Epidemiology 304 Tutorial Guide Taught by: Simon Thornley Adapted by Simon Thornley from material initially prepared by Professor Robert Scragg, University of Auckland. 1 Table of Contents 1: Exploring data and unadjusted (univariate) effect estimates ........................................................... 6 2: Continuing with unadjusted Effect Estimates ............................................................................ 22 3: Controlling for confounding using stratification (Mantel-Haenszel method)............................ 29 4: Interaction or effect modification: using 2x2 tables ...................................................................... 38 5: Introduction to logistic regression ................................................................................................. 59 6: Using logistic regression to investigate effect modification ...................................................... 82 7: Using logistic regression to examine effect modification (2) .................................................... 90 8: Sample size using Statcalc ....................................................................................................... 101 References............................................................................................................................................ 111 2 New Zealand Cot Death Study This study was undertaken in the early 1980s to determine the cause of an apparent rise in the incidence of cot death in New Zealand.[1] The researchers selected cases of cot death and administered a questionnaire to parents, a sample of which is contained in the following dataset used for this exercise. Control infants were recruited from the community at the same time as cases (incidence-density sampling). A variety of exposures, thought to cause or contribute to cot death were considered, along with a number of confounding variables. In this tutorial, we will first consider the effect of bed sharing on cot death. Then we consider the possible confounding effect of socio-economic status, and how this variable impacts on the exposure of interest. We will then consider the effect of maternal (mother smoking) on the relationship between bed sharing (exposure) and cot death (outcome). The data used in this session is a subsample of the actual data used for the publications, so the analyses won’t match exactly what was reported in medical journals. In addition to the output derived from Epi Info, I have also produced some supplemental figures with SPAN[2] and the Epicalc[3] utility of the R-program. These figures are not possible to produce with Epi info and are only included here to (try to!) improve your understanding of the material. They are not examinable. SIDS = Sudden Infant Death Syndrome Table: Data dictionary for the Excel file SIDS_EpiInfo. Variable Name REGION Description Region of the country Values 1 = Auckland 2 = Waikato 4 = Wellington 5 = Christchurch 6 = Dunedin CASE Case control status 1 = case, 2 = control ETHNIC Ethnicity 1 = Maori 2 = Pacific Island 3 = European SEX Gender of infant 1 = Male 2 = Female INFANT_AGE Infant’s age at death or interview (weeks) Continuous variable INFANT_AGE_GRP Infant’s age at death or interview (weeks): grouped 1 = <13 3 2 = 13 to19 3 = 20 to 25 4 = > 26 BIRTH_WT Infant’s birth weight (gms) Continuous variable BIRTH_WT_GRP Infant’s birth weight (gms): grouped 1 = <2500 2 = 2500 to 2999 3 = 3000 to 3499 4 = > 3500 GESTATION Length of pregnancy (weeks) Continuous variable MOTHER_AGE Mother’s age at birth (years) Continuous variable MOTHER_AGE_GRP Mother’s age at birth (years): grouped 1 = ≤19 2 = 20-24 3 = 25-29 4 = ≥ 30 ANTINAT 1st attendance at antenatal clinic 1 = <3 months 2 = > 3 months OCCUPATION Household SES category 1 = 1 & 2 (high) 2=3&4 3 = > 5 (low) SEASON Season of the year 1 = Jan-Feb 2 = Dec-March 3 = Nov-Apr 4 =Oct-May 5 = Sept-June 6 =Aug-July BEDSHARE Bed share in the last sleep 1 = Yes 2 = No SLEEP_POSITION Position in last sleep 1 = back 2 = side 3 = front, face down 4 = front, face to side 5 = other DUMMY Used dummy in last sleep 1 = Yes 2 = No MAIN_MILK Main type of milk drunk by baby in last 2 days 1 = breast 2 = bottled cow’s milk 3 = modified cow’s milk 4 4 = soya based milk 5 = goat’s milk 6 = other special milks MOTHER_TOBACCO Mother smoked cigarettes in the last 2 weeks. 1 = Yes 2 = No 3 = occasional <1/day FATHER_TOBACCO Father smoked cigarettes in the last 2 weeks. 1 = Yes 2 = No 3 = occasional <1/day MOTHER_CANNABIS Mother had cannabis since birth of baby 1 = Yes 2 = No 3 = chose not to answer CANNABIS_FREQ Frequency of mother’s cannabis use 1 = daily 2 = weekly 3 = monthly 4 = less often 5 1: Exploring data and unadjusted (univariate) effect estimates Use TABLES command to calculate unadjusted adjusted odds ratios Use SELECT & CANCEL SELECT to select two exposure levels for calculating odds ratios, if there are more than two exposure levels Commands in this Lesson The following commands are used in this lesson. READ/IMPORT READ is the most commonly used command. The Read (Import) command changes the current data source and/or the current project. It removes any standard defined variables. The READ command operates on many different types of data. Epi Info™ can Read in 24 different types of files. Located in the Data folder. DISPLAY Use this command to display table, view, and database information. Use the display option Variables Currently Available to see all the variables in the dataset, including names, field types, and format information. Use Display, prior to merging or creating statistics, to ensure that field types and variables names have been coded as needed. Located in the Variables folder. LIST The List command creates a listing of the current data table. Lists can be customized to list all, exclude, or show specific records. Located in the Statistics folder. SELECT The Select command specifies a condition that must be true for a record to be processed. Use to select a set of records for analyses. For example, select records based on gender or zip code. Located in the Select/If folder. CANCEL SELECT The Cancel Select command cancels a previous SELECT command. Located in the Select/If folder. Command for calculating the crude odds ratio and Mantel-Haenszel odds ratio (or relative risk) TABLES Use this command to create frequencies or counts (total numbers) of categorical variables. Categorical variables mean that each value falls into one of a set of groups (eg case or control status, ethnicity, bed sharing), compared to a measured, numeric variable, such as age. Numeric variables can, of course, be divided into categories to make them ‘categorical’. Tables can help determine the probability that a risk factor is linked to an outcome. For these values to have their accepted epidemiological meanings, the value representing presence of the exposure (independent value) and outcome conditions (dependent variable) must appear in the first row and column of the table. Epi Info yes/no variables are 6 automatically sorted. Values of the first selected variable will appear across the top of the table, and those of the second selected variable will be on the left hand side (margin) of the table. What does the output mean? Normally cells contain counts of records matching the values in the corresponding marginal labels. For 2x2 tables, the command produces odds ratios and risk ratios. For tables other than 2x2, Chisquare statistics are computed. The p-value is the probability that the observed association (measured by odds ratio) between two variables may be due to chance (i.e. no relationship between exposure and outcome). If the p-value is very low, it means that the chance that the association between exposure and outcome is very unlikely to be due to chance alone. A low p-value of <.05 means that the risk factor is unlikely to be associated with disease due to chance alone. Importantly, when measuring associations between exposures and outcomes, it is important to consider whether observed effects are due to “confounding”. This occurs when an observed association, such as between an exposure (bed sharing) and an outcome (cot death) may be explained by the presence of a third factor (socioeconomic status) which may be linked to both the exposure and outcome, and account for the observed effect. This is shown diagrammatically below, in which the association between bed sharing and cot death (dashed line) may be explained, if socioeconomic status is linked to both bed sharing and cot death. In this series of tutorials, we introduce two methods of controlling for confounding. Firstly, using stratification (the Mantel-Haenszel test), and secondly using regression modelling. 7 Is ethnicity associated with risk of cot death (SIDS)? You are going to use the TABLES command to examine the relationship between two or more categorical values. You want to see if risk of cot death is associated with ethnicity. 1. READ the Excel file called SIDS_EpiInfo. In the READ dialogue box, select Excel 8.0 in the DATA_FORMATS drop down menu, then navigate to the path with the SIDS_EpiInfo.xls file. Select SIDS under the WORKSHEETS space, then click OK. Exploring data in Epiinfo. Before you start any analysis, it is worthwhile visualising the data to make sure it is in the correct range, and does not contain any errors. We will briefly cover a couple of useful commands. In this session, we are most interested in the variables BEDSHARE, CASE, and ETHNIC. 8 2. Let us look at the distribution of these variables. We will start with BEDSHARE. To get a graphic display and count of the number of variables in each category use the “Frequencies” command to select BEDSHARE. Press OK. You should see something like this: You can see that roughly half the population bedshare (BEDSHARE=1) and the other half do not (BEDSHARE=2). In the same way we can examine ETHNIC and CASE 9 What proportion of the study are European (ETHNIC=3), Maori (ETHNIC=3) and Pacific (ETHNIC=2)? What was the proportion of cases (CASE=1) to controls (CASE=2) in the study? Although we will not look at continuous data until much later, you might ask how can we explore data that is continuous, for example, MOTHER_AGE. If you use the “Frequencies” command, you get a lot of output. A better method is to use the “graph” function, under “statistics”. Select “histogram” from the “graph type” drop down menu on the top left of the window, then the variable “MOTHER_AGE” from the “main variable” drop down box. 10 Then press “OK”. 11 This histogram tells you a lot about this variable. For instance, the range of values is between 13 and 45. The extremes are believable, so you do not suspect a coding error. Also, you see most of the values are between 25 and 31 years. Again, this makes sense. If you were to categorise this variable, the histogram would help decide where to select cut points so that you get roughly the same number of individuals in each category, or at least enough to get adequate statistical power. Having decided that the quality of the data is ok, we will now turn to doing some basic effect measures. 3. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 4. From the Exposure Variable drop-down, select ETHNIC (they are listed in alphabetical order). 5. From the Outcome Variable drop-down, select CASE. 12 6. Click OK. Results appear in the Output window. The output is a table with 2 disease categories (cases and controls), and three exposure levels. No odds ratio or relative risk values are shown because there are more than two exposure levels. To calculate odds ratios, you have to select only two ethnic groups for your analysis; firstly by comparing Maori infants with European, and then by comparing Pacific Island infants with European. European infants have been chosen as the reference category as they are a large group with a low risk of cot death – the column percent for European controls (73.5%) is much lower than for their cases (49.3%). The same strategy can be used for comparing other variables that have many categories, such as age. Exposure: Maori vs European 7. From the Select/If folder, click Select. The Select dialogue box opens. Choose Ethnic=1 (Maori) and Ethnic=3 (European), and click OK. Note that the Record Count is now 1706 (compared to 1862 when the Excel file was originally read). 13 8. Click Tables. The Tables dialog box opens. From the Exposure Variable drop-down, select ETHNIC. From the Outcome Variable drop-down, select CASE, and click OK. Results appear in the output window. The odds ratio (and risk ratio) are now both shown. Odds ratios are the appropriate values for a case control study. The cross-product odds ratio shows that Maori infants have 3.77 times the odds of cot death over European infants. This is a statistically significant result, since the 95% confidence intervals (2.94, 4.84) do not include the reference value of 1. The Pvalue gives similar information to the confidence interval, but answers the question “How weird is this result if ethnic group exerted no effect on cot death?”. The P-value, which is quoted as “0.000000”, or “<0.00001” if you want to be technically correct. Even if a result is very weird, it can never be impossible. The low P-value indicates that if ethnic group (Maori vs non-Maori) had no effect on cot death (null hypothesis), this result would be very, very weird, or almost impossible! Therefore the null hypothesis is rejected, and we think that ethnic group does influence risk of cot-death. Most epidemiologists report odds ratios (or risk ratios) rounded to 2 decimal places. 14 Diversion! Visualising the association between ethnic group and cot death... I show an alternative way of displaying this information which may help you understand the concept of odds ratios. The first scaled rectangle diagram shows a box which is proportional to the total study population (Pacific excluded). Within this, is a box proportional to the number of cases (labeled “CASES=1”, about ¼ of the study), with those outside this box controls. The number of Maori (also just over a quarter of the population) is displayed in the lighter coloured box, with all those outside this square European (white-controls; dark blue-cases). This gives a visual display of the exposure (ethnic group) and outcome (case status – cot death or no cot death). The odds ratio is calculated from the numbers displayed in the diagram: 𝑀𝑎𝑜𝑟𝑖 𝑐𝑎𝑠𝑒𝑠 (165) ⁄𝑀𝑎𝑜𝑟𝑖 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (262) 𝐸𝑢𝑟𝑜 𝑐𝑎𝑠𝑒𝑠 (183) ⁄𝐸𝑢𝑟𝑜 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (1096) You can readily appreciate that the ratio of areas (cases to controls), represented by the areas of the overlapping boxes below, for Maori, is much higher than for European. 15 This is very different to what would be expected if case (cot death) status was unrelated to ethnic group, keeping the proportion of Maori and proportion of cases to non-cases constant (illustrated in the following diagram). You can see that, in this case, the odds ratio for independence (no effect) between cot death and ethnicity would be: 𝑀𝑎𝑜𝑟𝑖 𝑐𝑎𝑠𝑒𝑠 (87) ⁄𝑀𝑎𝑜𝑟𝑖 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (340) 𝐸𝑢𝑟𝑜 𝑐𝑎𝑠𝑒𝑠 (261) ⁄𝐸𝑢𝑟𝑜 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (1018) 16 Put yet another way (!), what we are comparing in the odds ratio, is the odds of being Maori (rather than European) if one is a case. We calculate this as 165/183=0.90 (point estimate). That is if you are a case, you are about as likely to be Maori as European, in this study. You can imagine that we could repeat this study many times over. You would not usually get exactly the same result, but something close. We use a mathematical distribution (the binomial) as an approximation of what may be expected if you do the study over and over again (see below). The red line is the median value or point estimate, and the 95% confidence interval lines are given in blue. The odds are similar to 1:1 or one Maori to one European among cases, or a probability of being Maori as ½. 17 1000 0 500 Frequency 1500 2000 Histogram of pc 0.6 0.8 1.0 Odds of Maori, if case Similarly the odds of being Maori, if control is much lower… 18 1.2 1000 0 500 Frequency 1500 2000 Histogram of pco 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Odds of Maori, if control If we divide one set of values over the other to generate an odds ratio, then we get a Chi-square distribution for the odds ratio which looks like this: 19 1500 0 500 1000 Frequency 2000 2500 Histogram of or 2 4 6 8 Odds ratio Notice now that the distribution has changed from a symmetric distribution, based on the binomial (similar to the normal distribution), to an asymmetric distribution, characteristic of the Chi-square. Also note that the 95% confidence intervals do not include the null effect of 1. We are, therefore, confident that the effect of ethnic group (Maori) on cot death is unlikely to be due to chance. We haven’t ruled out a third factor, which may be linked to both being Maori and developing cot death (confounding), such as cigarette smoking, however. Note the median (red line) is about the same as the calculated point estimate for the odds ratio (3.7). 20 1000 0 500 Frequency 1500 2000 Histogram of nullor 1.0 1.5 2.0 2.5 3.0 Odds ratio Above, we have simulated what sort of results we would expect given the null hypothesis, that the odds of being Maori are the same for both cases and controls. The red line above represents the lowest 95% confidence interval for the alternate hypothesis, which gives the 2 sided P-value. You can see that virtually no values fall outside this barrier, so the P-value is very small (<0.000001). 9. From the Select/If folder, click Cancel Select, and click OK. The Record Count now returns to the full sample size of 1862. 21 2: Continuing with unadjusted Effect Estimates Exposure: Pacific vs European 10. Repeat step 6 above. From the Select/If folder, click Select. The Select dialogue box opens. Choose Ethnic=2 (Pacific Island) and Ethnic=3 (European), and click OK. Note that the Record Count is now 1435. 11. Click Tables. The Tables dialog box opens. From the Exposure Variable drop-down, select ETHNIC. From the Outcome Variable drop-down, select CASE, and click OK. Results appear in the output window. The cross-product odds ratio shows that Pacific Island infants have 1.04 times the risk of cot death than European infants. This is not statistically significant result since the 95% confidence interval (0.65, 1.66) includes the reference value of 1. This is confirmed by a high p-value (>0.05). This is similarly illustrated for Pacific, as for the Maori vs European comparison, using scaled rectangle diagrams. The odds ratio is calculated similarly: 22 𝑃𝑎𝑐𝑖𝑓𝑖𝑐 𝑐𝑎𝑠𝑒𝑠 (23) ⁄𝑃𝑎𝑐𝑖𝑓𝑖𝑐 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (133) 𝐸𝑢𝑟𝑜 𝑐𝑎𝑠𝑒𝑠 (183) ⁄𝐸𝑢𝑟𝑜 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (1096) If ethnic group was unrelated to cot death status, the scaled rectangle diagram would look like this: 23 As you can see the two diagrams are not dramatically different. The P-value assesses the chance of this difference being due to random variation, and in this comparison, this is a likely explanation for the observed difference. 12. From the Select/If folder, click Cancel Select, and click OK. The Record Count now returns to the full sample size of 1862. Exposure: Infant bed sharing You want to see if infants who share the bed with their parents (or other adults) in the last two weeks, when they are sleeping, have an increased risk of cot death. 1. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 2. Select the Exposure Variable of BEDSHARE. 3. Select the Outcome Variable of CASE. 4. Click OK. Results appear in the Output window. The odds ratio of 2.14 (95% CI: 1.70, 2.70) indicates that infants who bed share have a significantly increased risk of cot death, compared to infants who do not bed share. 24 This is again illustrated by scaled rectangle diagram, with the odds ratio is calculated similarly: 𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (231) ⁄𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (646) 𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (141) ⁄𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (844) 25 You easily see that this is very different from what would be expected if the exposure and outcome were independent (unrelated): 26 A plot of the two odds ratios are highlighted below (bed sharing –exposure; cot death-case). You can appreciate that the cases were much more likely to share the bed than controls. The 95% confidence intervals show that if these studies were repeated time and time again on different samples of the same population, we would still see a marked difference in odds. 27 Odds ratio from case control study I I Outcome category case OR = 2.14 95% CI = 1.68 , 2.72 control I I 0.69 0.86 1.06 1.32 Odds of exposure 28 1.63 2.02 3: Controlling for confounding using stratification (Mantel-Haenszel method). Does household socioeconomic status confound the association between infant bed sharing and risk of cot death (SIDS)? You have identified that infants who bed share have a higher risk of cot death. Now you want to see if household socioeconomic status (SES) is a confounder. One way to solve the problem of confounding is to restrict comparisons to individuals who have the same value of the confounding variable (in this case, SES). Splitting the analysis up by SES allow us to assess the the effect of bed sharing on cot death without the problem of variation in this confounding variable. The subsets of occupation which we use to split up the data are called strata, and so this process is known as stratification. Unless the effect of exposure on outcome differs substantially between strata (in which case we encounter a different issue which we will discuss later – effect modification) we usually wish to combine the evidence from the separate levels of SES, and summarise the effect controlling for the confounder. Strata with more individuals will tend to have a more precise estimate of the effect than strata with fewer individuals. We therefore account for this by taking a weighted average of the effects. The most common method of weighting is given by the Mantel-Haenszel estimate. In our example, for one level of socioeconomic status, we have the familiar two by two table Outcome Exposure Cot Death No Cot Death Total Bed share ai bi ai+bi No bed share ci di ci+di ai+ci bi+di Ni Total The weight of each stratum is calculated by multiplying the number of unexposed cases with the number of exposed controls and dividing by the total number in that stratum: 𝑤𝑒𝑖𝑔ℎ𝑡𝑖 = 𝑐 𝑖 x 𝑏𝑖 𝑁𝑖 The final Mantel-Haenszel odds ratio is calculated by summing (Σ) the products of the stratum specific weights and odds ratios and dividing by the sum of the weights: 𝑂𝑅𝑀𝐻 = ∑(𝑤𝑒𝑖𝑔ℎ𝑡𝑖 x OR 𝑖 ) ∑(𝑤𝑒𝑖𝑔ℎ𝑡𝑖 ) 29 This can get a bit messy if you have a lot of strata. We then compare the stratified (ORMH) with the crude odds ratio. If the change in the stratified effect estimate is greater than 10% (compared to the crude), we consider that confounding is likely to be present. Thankfully, these calculations can be done in a straightforward manner in Epi info. I will not show you how it is calculated here, but Epi info also calculates a test of the null hypothesis, which, here, is that after controlling for socioeconomic status is there an effect of bed sharing on cot death (i.e. is the ORMH sufficiently larger (or smaller) than one to be unlikely to be due to random error. 1. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 2. Select the Exposure Variable of BEDSHARE. 3. Select the Outcome Variable of CASE. 4. Select the Stratify by Variable of OCCUPATION. 5. Click OK. Results appear in the Output window. A 2x2 table, with odds ratio calculations, appears for each of the 3 levels of OCCUPATION. Scroll down to the bottom of the Output to see the summary information below. Note that the crude odds ratio (cross product) has changed from 2.65 to 2.24 after adjusting for SES. This indicates that OCCUPATION partially confounds the association between bed sharing and cot death, since the change in the odds ratio between crude and adjusted is more than 10%. The output also shows that the test for interaction (effect modification) between strata is high 0.3, indicating that the variation in the stratum specific odds ratios is likely to be due to chance alone and is less likely to be attributable to a systematic effect. 30 These results are shown visually in a scaled rectangle diagram below 31 Although, the picture is getting quite complex now, you can see that if cot death, occupation and bedsharing were unrelated, the picture would be quite different (see below). 32 Calculation of stratum specific odds ratios is possible by considering case status and bed sharing within an occupational class. For example, the stratum specific odds ratios for the effect of bed sharing on cot death (case status) are, for high occupational status limited to the purple upper rectangle, divided by case and bedsharing status: 𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (38) ⁄𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (215) 𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (29) ⁄𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (306) The stratum specific odds ratio, is therefore 1.86. You can see visually (from the scaled rectangle diagram below) that the ratios of these areas are similar. Also, the other stratum specific odds can be calculated. The same diagram shows the differences between the odds in the exposed and unexposed. Although, the slope of the line represents the difference in odds (not ratios) for cases and controls at each level of occupation. You can see that the red (high) group is least likely to bedshare, among both cases and controls; and the low (blue) group is most likely to bedshare both among cases and controls. The 95% confidence intervals for the individual odds are also shown for each stratum. 33 Stratified case control analysis Case I Outcome= clogic , Exposure= blogic I I I I I OCCUPATION1: OR= 1.86 (1.08, 3.24) OCCUPATION2: OR= 2.04 (1.43, 2.92) OCCUPATION3: OR= 1.92 (1.22, 3.05) MH-OR = 1.97 (1.55, 2.5) homogeneity test P value = 0.949 Control I I 0.59 I I I 0.82 I 1.14 1.59 2.21 3.07 Odds of exposure Does ethnic group confound the association between infant bed sharing and risk of cot death (SIDS)? You have identified that household SES partially confounds the association between infant bed sharing and risk of cot death. Now you want to see if ethnicity is also a confounder. 1. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 2. Select the Exposure Variable of BEDSHARE. 3. Select the Outcome Variable of CASE. 4. Select the Stratify by Variable of ETHNIC. 5. Click OK. Results appear in the Output window. A 2x2 table, with odds ratio calculations, appears for each of the 3 levels of ETHNIC. 34 Note that the odds ratio varies between ethnic groups, being 2.33 for Maori (ETHNIC=1), 0.70 for Pacific Island (ETHNIC=2) and 1.50 for European infants (ETHNIC=3). Scroll down to the bottom of the Output to see the summary information below. Note that the crude odds ratio (cross product) has changed from 2.13 to 1.63 after adjusting for ETHNIC. This indicates that ETHNIC partially confounds the association between bed sharing and cot death, since the change between crude and adjusted odds ratios is more than 10%. However, also note, at the second bottom row that the “Chi-square for differing Odds Ratios by stratum (interaction)” is 5.72 and the p-value for this is 0.0572. This indicates that the odds ratios are on the borderline of differing significantly between the ethnic groups. This is called interaction, heterogeneity or effect-modification, since ethnicity is modifying the effect of bed sharing on risk of cot death. When there is significant interaction between variables, we cannot report one adjusted odds Ratio, controlling for the confounding variable, because the effect of bed sharing on cot death risk differs substantially between strata (in this case by ethnic group). 35 The scaled rectangle diagram is shown below. The white space represents the largest ethnic group, European. The ethnic specific odds ratio, for Maori, is: 𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (134) ⁄𝐵𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (31) 𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑎𝑠𝑒𝑠 (31) ⁄𝑁𝑜 𝑏𝑒𝑑 𝑠ℎ𝑎𝑟𝑒 𝑐𝑜𝑛𝑡𝑟𝑜𝑙𝑠 (92) You can see that the numerator odds is higher than the denominator, so the odds ratio will be high. This is illustrated in the graph below with the red line showing the largest difference in the two odds. As you can see, the odds differences (similar to odds ratios), represented by the slopes of the lines, connecting the odds ratios, are much more heterogeneous than in the example which examines the effect of bed sharing on cot death, adjusted for socioeconomic status. 36 Stratified case control analysis I Outcome= clogic , Exposure= blogic Case I I I I ETHNIC1: OR= 2.33 (1.44, 3.86) ETHNIC2: OR= 0.7 (0.25, 2.07) ETHNIC3: OR= 1.5 (1.08, 2.08) MH-OR = 1.63 (1.27, 2.1) homogeneity test P value = 0.058 Control I I I I 1/2 1 I 2 Odds of exposure 37 I 4 I 4: Interaction or effect modification: using 2x2 tables    Use TABLES command to calculate unadjusted and adjusted odds ratios to control for confounding Use DEFINE, IF, ASSIGN to create new variables Use SELECT & CANCEL SELECT to select two exposure levels for calculating odds ratios, if there are more than two exposure levels. Commands in this Lesson The following new commands are used in this lesson. LIST The List command creates a listing of the current data table. Lists can be customized to list all, exclude, or show specific records. Located in the Statistics folder. SORT The Sort command specifies the sequence in which records will appear when using the LIST, GRAPH, or WRITE commands. SORT organizes the listed data in an ascending or descending order, based on selected variables. For example, you can sort by last name or age. Located in the Select/If folder. CANCEL SORT The Cancel Sort command cancels a previous SORT command. Located in the Select/If folder. Commands for creating new variables ASSIGN This command is used after the if command to assign a new value to a variable. RECODE This command is used to create new variables, based on the values of other variables. You can use this command to create categories based on numerical (continuous) variables. Aggregating sparsely defined categorical variables may also be achieved using this command. DEFINE This command lets you define the name of a new variable which you will then create using assign, define and if. IF This command lets you make conditional statements, so that if a condition, based on values of other variables are met, then you assign (using this command) a value to your new variable. 38 In the previous session, you did an analysis that showed effect modification from ethnicity in the association between infant bed sharing and risk of cot death. In this session you will use the TABLES command to analyse this effect modification in more detail, and determine whether the interaction is additive or multiplicative. The increased risk of cot death associated with bed sharing by Maori infants suggests that some other variable, which occurs commonly among this ethnic group, combines with bed sharing to greatly increase the risk of cot death. Extensive analysis of the NZ Cot Death data set in the 1990s revealed that one of the lifestyle variables associated with ethnicity was maternal tobacco smoking, which was more common among mothers of Maori infants than mothers of infants from other ethnicities (Mitchell EA, et al. Ethnic differences in mortality rate from sudden infant death syndrome in New Zealand. Brit Med J 1993; 306: 13-6). This suggested that there may be an interaction between maternal tobacco smoking and infant bed sharing. You will explore this possibility in the data set by combining values for two variables (maternal tobacco smoking & infant bed sharing) into a single new combined variable with four exposure levels using the commands DEFINE, IF and ASSIGN, in order to calculate odds ratios for each combined exposure level. 39 A. CONVERT MATERNAL TOBACCO VARIABLE INTO A BINARY VARIABLE The data dictionary for the SIDS_EpiInfo data set shows that the variable MOTHER_TOBACCO has 3 levels. This variable first needs to be converted into a binary variable before it can be combined with the infant bed sharing variable. 1. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 2. Select the Exposure Variable of MOTHER_TOBACCO. 3. Select the Outcome Variable of CASE. 4. Click OK. Results appear in the Output window. A 3x2 table appears (below). Note the small numbers of cases (n=6) and controls (n=28) with mothers who smoke tobacco occasionally (<1 cigarette per day). These numbers are too small to analyse as a separate group, and need to be combined with either group 1 (smoke daily) or group 2 (non-smokers). You will combine them with group 1 so that you have a clear comparison of smokers and non-smokers. To do this, you will define a new variable and use the RECODE command to combine both smoking groups together. 5. From the Variables folder, click Define. The DEFINE dialog box opens. 6. Type in the Variable Name space ‘Mother_smoke’. Your Dialogue box should look like this (below). 40 7. Click OK. 8. From the Variables folder, click Recode. The RECODE dialog box opens. 9. From the From drop-down, select MOTHER_TOBACCO, and from the To drop-down, select Mother_smoke. a. Click your mouse in the top left hand cell of the table, and enter “1” in the left hand column with heading ‘Value (blank = other)’; b. then press ENTER twice to move to the top right hand cell, and enter “1” in the right hand column with heading ‘Recoded Value’; c. press ENTER to move to the next row. 10. Repeat steps 9a to 9c for the 2nd row by entering ‘2’ in the left hand column and ‘2’ in the right column. 11. Repeat steps 9a to 9c for the 3rd row by entering ‘3’ in the left hand column and ‘1’ in the right column. The RECODE dialogue box should look like this (below), with the left cell of the 4th row highlighted. 12. Click OK. 41 13. Use the TABLES command to check that you have correctly recoded MOTHER_TOBACCO into Mother_smoke so that all smokers are combined into a single group. 14. From the Command Tree Statistics folder, click Tables. The Tables dialog box opens. 15. Select the Exposure Variable of MOTHER_TOBACCO. 16. Select the Outcome Variable of ‘MOTHER_SMOKE’. 17. Click OK. Results appear in the Output window (below). Check that you have correctly recoded MOTHER_TOBACCO into ‘MOTHER_SMOKE’ and that there are no missing observations. 42 B. CREATE A SINGLE COMBINATION VARIABLE FROM MATERNAL TOBACCO SMOKING & INFANT BED SHARING Now that you have converted the variable MOTHER_TOBACCO into a new variable called ‘MOTHER_SMOKE’, which has two levels (smoker, non-smoker), this new variable can now be combined with the infant bed sharing variable (BEDSHARE) into a single variable called ‘Smoke_Share’, which has four levels as shown in the following table, using the commands DEFINE, IF and ASSIGN. Existing Variables Combination Variable MOTHER_SMOKE BEDSHARE SMOKE_SHARE Yes Yes 1 Yes No 2 No Yes 3 No No 4 Note that infants coded ‘No’ for both ‘MOTHER_SMOKE’ and BEDSHARE, who are expected to have the lowest risk of cot death and therefore should be our reference group, will be given the value of ‘4’ to ensure they are on the bottom row for odds ratio calculations. EpiInfo assumes the reference group is on the bottom row when calculating odds ratios (or relative risks) with the TABLES command. 1. From the Variables folder, click Define. The DEFINE dialog box opens. 2. Type in the Variable Name space ‘Smoke_Share’. Your Dialogue box should look like this (below). 3. Click OK. 4. From the Select/If folder, click If. The IF dialog box opens. 5. From the Available Variables drop-down, select ‘Mother_smoke’, and use the buttons to make it equal to ‘1’. 43 6. Click the AND button. 7. From the Available Variables drop-down, select BEDSHARE, and use the buttons to make it equal to ‘1’. Your Dialogue box should look like this (below). 8. Click Then. 9. From the Variables folder (on the left of the original screen), click Assign. The ASSIGN dialog box opens. 10. From the Assign Variable drop-down, select ‘Smoke_Share’, and use the buttons to make it equal to 1. Your Dialogue box should look like this (below). 11. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears. 44 12. Click OK. You have created the first level of the new combination variable ‘Smoke_Share’. 13. Now you will create the second level of the new combination variable ‘Smoke_Share’. From the Select/If folder, click If (which is highlighted in blue). The IF dialog box opens. 14. From the Available Variables drop-down, select ‘Mother_smoke’, and use the buttons to make it equal to ‘1’. Click the AND button. From the Available Variables drop-down, select BEDSHARE, and use the buttons to make it equal to ‘2’. Your Dialogue box should look like this (below). 15. Click Then. 16. From the Variables folder, click Assign. The ASSIGN dialog box opens. 17. From the Assign Variable drop-down, select ‘Smoke_Share’, and use the buttons to make it equal to 2. Your Dialogue box should look like this (below). 45 18. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears (below). 19. Click OK. You have created the second level of the new combination variable ‘Smoke_Share’. 20. Now you will create the third level of the new combination variable ‘Smoke_Share’. From the Select/If folder, click If (which is highlighted in blue). The IF dialog box opens. 21. From the Available Variables drop-down, select ‘Mother_smoke’, and use the buttons to make it equal to ‘2’. Click the AND button. From the Available Variables drop-down, select BEDSHARE, and use the buttons to make it equal to ‘1’. Your Dialogue box should look like this (below). 46 22. Click Then. 23. From the Variables folder, click Assign. The ASSIGN dialog box opens. 24. From the Assign Variable drop-down, select ‘Smoke_Share’, and use the buttons to make it equal to 3. Your Dialogue box should look like this (below). 25. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears (below). 47 26. Click OK. You have created the third level of the new combination variable ‘Smoke_Share’. 27. Now you will create the fourth level of the new combination variable ‘Smoke_Share’. From the Select/If folder, click If (which is highlighted in blue). The IF dialog box opens. 28. From the Available Variables drop-down, select ‘Mother_smoke’, and use the buttons to make it equal to ‘2’. Click the AND button. From the Available Variables drop-down, select BEDSHARE, and use the buttons to make it equal to ‘2’. Your Dialogue box should look like this (below). 29. Click Then. 30. From the Variables folder, click Assign (which is highlighted in blue). The ASSIGN dialog box opens. 31. From the Assign Variable drop-down, select ‘Smoke_Share’, and use the buttons to make it equal to 4. Your Dialogue box should look like this (below). 48 32. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears (below). 33. Click OK. You have created the fourth and final level of the new combination variable ‘Smoke_Share’. 34. Click Tables, to check that you have correctly created the new combination variable ‘Smoke_Share’. You should have the output below. 49 35. Now you are able to calculate odds ratios for each of the first three rows compared with row as the reference, by using the SELECT command. 50 C. CALCULATE ODDS RATIOS OF COT DEATH ASSOCIATED WITH THE NEW COMBINATION VARIABLE CREATED FROM MATERNAL TOBACCO SMOKING & INFANT BED SHARING Odds ratios (and relative risks) are only calculated from the TABLES command when there are two exposure levels. You will use the SELECT command to select two exposure levels, so that you can calculate odds ratios. 1. From the Command Tree Select/If folder, click Select. The Select dialogue box opens. From the Available Variables drop-down, select the groups ‘Smoke_Share’=1 or 4, so that your dialogue box looks like below. The Record Count now is 1066. 2. Click Tables, to calculate the odds ratio so that you can compare ‘Smoke_Share’ groups 1 and 4. You will also calculate and odds ratio adjusted for ethnicity, so: a. Select the Exposure Variable of ‘Smoke_Share’. b. Select the Outcome Variable of CASE. c. Select the Stratify by Variable of ETHNIC. 3. The dialogue box should look like below. Click OK. 6. Results appear in the Output window. A 2x2 table, with odds ratio calculations, appears for each of the 3 levels of ETHNIC. Note that the odds ratio is consistently high in all ethnic groups, being 4.53 for Maori (ETHNIC=1), 2.56 for Pacific Island (ETHNIC=2) and 5.52 for European infants (ETHNIC=3). The p-value for the Chi-square 51 for differing Odds Ratios by stratum (interaction) is above 0.05 (=0.5712) confirming that the odds ratios do not vary, significantly, between ethnic groups. Scroll down to the bottom of the Output to see the summary information below. Note that the summary Mantel-Haenszel odds ratio is 4.85 (95% CI: 3.27 to 7.19). This is very high, although lower than the crude odds ratio (cross product) of 7.11, indicating that ETHNIC partially confounds the association between bed sharing and cot death. The findings are again portrayed below, to give a visual summary of the excess odds of having a smoking mother and bedsharing, between cases and controls, for the three different ethnic groups (red=European; green=Pacific; and blue=Maori). Note exposure here is considered a mother who smokes and bed shares, compared with a nonsmoking mother who doesn’t bedshare. 52 Stratified case control analysis I Outcome= clogic , Exposure= ss Case I I I I I ETHNIC1: OR= 4.51 (1.95, 11.77) ETHNIC2: OR= 2.52 (0.56, 15.88) ETHNIC3: OR= 5.5 (3.4, 8.91) MH-OR = 4.85 (3.27, 7.19) homogeneity test P value = 0.559 Control I I I 1/4 1/2 I I 1 2 I 4 8 16 Odds of exposure 4. Click Cancel Select, to return to the full data set. The Record Count now is 1863. 7. Repeat steps 1-4, but this time select the groups ‘Smoke_Share’=2 or 4 (compares infants with mothers who smoke, but don’t bedshare; with infants with mothers who do not smoke and do not bedshare), so that your Record Count is now is 985. Run the Tables command, with ‘Smoke_Share’ as the exposure variable, CASE as the outcome, and ETHNIC as the stratification variable. Note that the odds ratio is consistently high in all ethnic groups, being 1.77 for Maori (ETHNIC=1), 2.95 for Pacific Island (ETHNIC=2) and 2.99 for European infants (ETHNIC=3). The p-value for the Chi-square 53 for differing Odds Ratios by stratum (interaction) is just above 0.05 (=0.5846) confirming that the odds ratios do not vary between ethnic groups. The summary information at the bottom of the Output is shown below. The summary adjusted Mantel-Haenszel odds ratio is 2.70 (95% CI: 1.85, 3.93). 54 These findings are illustrated below. Stratified case control analysis I I I I Outcome= clogic , Exposure= ss Case I I ETHNIC1: OR= 1.76 (0.67, 5.06) ETHNIC2: OR= 2.87 (0.47, 21.59) ETHNIC3: OR= 2.99 (1.91, 4.68) MH-OR = 2.7 (1.85, 3.93) homogeneity test P value = 0.582 Control I I I I 1/2 1 I I 2 4 Odds of exposure 5. Click Cancel Select, to return to the full data set. The Record Count now is 1863. 8. Repeat steps 1-4, but this time select the groups ‘Smoke_Share’=3 or 4, so that your Record Count is now is 1149. This restricts our analysis to infants with mothers who do not smoke, but do bedshare; with infants who have mothers who do neither. Run the Tables command, with ‘Smoke_Share’ as the exposure variable, CASE as the outcome, and ETHNIC as the stratification variable. 55 The ethnic-specific odds ratios are: 1.46 for Maori (ETHNIC=1), 0.57 for Pacific Island (ETHNIC=2) and 1.23 for European infants (ETHNIC=3). The p-value for the Chi-square for differing Odds Ratios by stratum (interaction) is above 0.05 (=0.5733) confirming that the odds ratios do not vary, significantly, between ethnic groups. The summary information at the bottom of the Output is shown below. The summary adjusted Mantel-Haenszel odds ratio is 1.21 (95% CI: 0.82, 1.79). This is illustrated graphically below. The point estimates show the odds of having a mother who does not smoke but does bed share, compared to a mother who doesn’t smoke, neither bedshares, for infants in all case, control and ethnic groups. 56 Stratified case control analysis I I I I Outcome= clogic , Exposure= ss Case I I ETHNIC1: OR= 1.45 (0.54, 4.24) ETHNIC2: OR= 0.58 (0.1, 4) ETHNIC3: OR= 1.23 (0.76, 1.97) MH-OR = 1.21 (0.82, 1.79) homogeneity test P value = 0.579 Control I I I I 1/2 1 I I 2 4 Odds of exposure 6. Click Cancel Select, to return to the full data set for any further analyses. The Record Count now is 1863. 7. The summary adjusted odds ratios calculated for each level of the variable ‘Smoke_Share’ (adjusted for ethnic group) can now be added to a 2x2 table (as below) to help you interpret them. The odds ratios can be evaluated on the assumption that the effects from maternal smoking and bed sharing are additive, or are multiplicative. 57 Infant Bed Shares Mother Smokes Yes No Yes 4.85 2.70 No 1.21 1.00 On an additive basis, the increase in the odds ratio going from the reference value (=1.00) to that for infants exposed to both risk factors (=4.85) is 3.85. The change in odds ratio going from the reference (=1.00) to that for infants exposed to maternal smoking only (=2.70) is 1.70. The change in odds ratio going from the reference (=1.00) to that for infants exposed to bed sharing only (=1.21) is 0.21. The sum of the individual effects (1.70 and 0.21) is less than the combined effect (3.85). This indicates that the combined effect from being exposed to both maternal smoking and infant bed sharing is more than the sum of the individual effects of maternal smoking and bed sharing. Thus, there exists an interaction between these two exposures when they occur together. The table is better explained on the basis of interaction in which effects multiply. When the odds ratios for maternal smoking only (2.69) and bed sharing only (1.21) are multiplied with each other, the product is 3.58 (to 2 decimal places). This is close to the excess odds for infants exposed to both risk factors (3.85). This indicates that the combined effect from being exposed to both maternal smoking and infant bed sharing is similar to the product of the individual effects of maternal smoking and bed sharing by themselves. This confirms the presence of a strong interaction between these two exposures when they occur together. 58 5: Introduction to logistic regression Aim: to use logistic regression to analyse a case control study. Use TABLES command to calculate unadjusted and adjusted odds ratios to control for confounding   Use DEFINE, IF, ASSIGN to create new variables Use SELECT & CANCEL SELECT to select two exposure levels for calculating odds ratios, if there are more than two exposure levels. Commands in this Lesson The following commands are used in this lesson. Commands for multivariate analysis LOGISTIC REGRESSION This command is in the “Advanced Statistics” Folder. It allows the user to undertake logistic regression to investigate multivariate relationships between exposures and outcomes. What is Logistic Regression? This is a form of regression which is commonly used for the analysis of case-control data which has a binary outcome (two states – case or control). The general form of the logistic regression model is: 𝐿𝑜𝑔 𝑜𝑑𝑑𝑠 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 +. . . +𝛽𝑝 𝑥𝑝 For p exposure variables. The difference between logistic and linear regression is we model a transformation of the outcome variable, the log of the odds of the outcome. The quantity of the right hand side is known as the linear predictor of the log odds of the outcome. The β’s are the regression coefficients associated with the p exposure variables. The log odds is derived from the probability, or risk, π, of the outcome. The log odds is derived from the risk, π, using the “logit” function: 𝑙𝑜𝑔𝑖𝑡(𝜋) = log(π/(1-π)) While probabilities are restrained to values between 0 and 1, the log odds is not, with the odds able to take any value between 0 and infinity and the log odds taking any value between minus infinity to positive infinity. β 0 is the log odds in the unexposed group and β1 to βp correspond to the log odds ratio associated with various exposures. We can use logistic regression as an alternative to stratification when controlling for confounding variables. The advantage of regression (over stratification) is that we can, simultaneously, control for the effects of a large number of variables, without losing statistical power, or being constrained by small counts of individuals within strata. The disadvantage of modeling is that, although it has become 59 technically easy to do with modern software, its use involves a number of assumptions, and which the user must have some understanding of. Before we discuss controlling for confounding we need to make sure we know what properties of a particular exposure or attribute make it a confounder. Just as we discussed previously, ethnicity may confound the observed association between cot death and bed sharing, if ethnicity influences both the exposure (bed sharing) and the outcome (cot death). Both these possibilities make sense, as causation may only run in one direction, and it is unlikely that bed sharing or cot death would influence ethnicity. Alternatively, if we consider the effect of maternal smoking on the outcome cot death, then speculating on what effect birth weight is likely to have on this relationship, we would point the arrows of causation in an alternative direction. Cigarette smoking during pregnancy reduces the birth weight of the foetus. Low birth weight infants are also at higher risk of cot death. Instead of birth weight causing both the exposure and outcome, we consider, here that it is mediating this relationship. Adjusting or stratifying by the third variable, modifies the exposure-outcome effect, depending on the likely direction of causation. If, like the first example, the variable is a confounder, then you will be able to more accurately assess the true effect of exposure on outcome. If, instead, like the second example, the variable is a mediator, then you will likely underestimate the effect of the exposure. Put another way, if you imagine that the variables are sources of water and the arrows represent direction of flow between variables, then the strength of the relationship may be considered the flow rate in the pipe. You can imagine, in the first example, turning on the ‘tap” of bed sharing and seeing how much water ends up at the outcome cot death. If you find water at this outcome, you might consider this has come from the tap of bed sharing. To assess the relationship between bed sharing and cot death, differences in socioeconomic status need to be removed, either by stratification or regression. The confounding variable, may, however, be responsible, with water coming from socioeconomic status causing the apparently observed flow into the cot death variable. Alternatively, for maternal smoking and cot death, to accurately 60 assess the effect of the exposure on the outcome, because a portion of the flow from smoking to cot death results from effects of smoking on birthweight, if we control for this variable we will cut off some of the flow and underestimate the true effect of the exposure. If, using stratification, or logistic regression modeling, we do not consider the direction of causal relationships, we can inaccurately estimate the true odds ratios of exposures. This occurs because adjusting for variables which are on the causal pathway between the exposure and outcome, may diminish the effect of the exposure. Establishing the direction of causal pathways, along with the relationships between variables is why a thorough literature review is necessary before conducting an epidemiological study, and beginning data analysis. 61 Commands in this Lesson The following commands are used in this lesson. Commands for creating new variables DEFINE READ is the most commonly used command. The Read (Import) command changes the current data source and/or the current project. It removes any standard defined variables. The READ command operates on many different types of data. Epi Info™ can Read in 24 different types of files. Located in the Data folder. IF Use this command to display table, view, and database information. Use the display option Variables Currently Available to see all the variables in the dataset, including names, field types, and format information. Use Display, prior to merging or creating statistics, to ensure that field types and variables names have been coded as needed. Located in the Variables folder. ASSIGN The List command creates a listing of the current data table. Lists can be customized to list all, exclude, or show specific records. Located in the Statistics folder. RECODE This command is used to create new variables, based on the values of other variables. You can use this command to create categories based on numerical (continuous) variables. Aggregating sparsely defined categorical variables may also be achieved using this command. Commands for multivariate analysis LOGISTIC REGRESSION Use this command to view the current values of Analysis option settings and generate commands to change them. Statistical and graphic viewing options can be selected. Yes, No, and Missing values can be viewed in alternate forms. Allows the inclusion or exclusion of missing records in statistical computations. Located in the Options folder. LINEAR REGRESSION The Select command specifies a condition that must be true for a record to be processed. Use to select a set of records for analyses. For example, select records based on gender or zip code. Located in the Select/If folder. 62 Create a Logistic Regression model You are going to use the LOGISTIC REGRESSION command to calculate an odds ratio and 95% Confidence Limits for selected variables to see if they are significantly associated with having a disease. The LOGISTIC REGRESSION command produces an output with an odds ratio, 95% Confidence Limits , and p-value for each exposure (X) variable in the model. In EpiInfo:   the outcome(Y) variable should be: diseased (case) = YES, non-diseased (control) = NO; each exposure (X) variable should be: exposed = 1, unexposed = 0. If we have more than three categories of exposure, say for example Maori, Pacific and European, we can code these as 1 and 0 by assigning one category as the comparator (usually with the largest numbers – here European). Two other “dummy variables” are included, one with a value of 1 if the participant is Maori, and the other with a value of 1 if the participant is Pacific. Europeans will have values of 0 for both dummy variables. The beta coefficients associated with the Maori and Pacific variables can be used to estimate the odds ratio for these ethnic groups, compared to the comparator (European). You should only use logistic regression after you have analysed your data using the TABLES command and know what odds ratios to expect from visualizing the distribution of study participants in the diseaseexposure categories. Before you can run the logistic regression, both outcome and exposure variables need to be converted to the format that is recognized by EpiInfo (above). READ the Excel file called SIDS_EpiInfo. 63 CREATE THE Y (OUTCOME VARIABLE) For the outcome (Y) variable, CASE must be converted from 1 (for case) and 2 (for control) to YES (for case and NO (for control). To do this, you will need to define a new variable called SIDS and assign to it the appropriate values from the original variable CASE.) 18. From the Variables folder, click Define. The DEFINE dialog box opens. 19. Type in the Variable Name space ‘SIDS. Your Dialogue box should look like this (below). 20. Click OK. 21. From the Select/If folder, click If. The IF dialog box opens. 22. From the Available Variables drop-down, select CASE, and use the buttons to make it equal to 1. Your Dialogue box should look like this (below). 23. Click Then. 64 24. From the Variables folder (on the left of the original screen), click Assign. The ASSIGN dialog box opens. 25. From the Assign Variable drop-down, select ‘SIDS’, and use the buttons to make it equal to “Yes”. Your Dialogue box should look like this (below), with = (+) in the space under Expression. 26. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears. 27. Click ELSE. 28. You are then taken to the Variables folder on the left of the screen. Click on ASSIGN. Choose the variable SIDS from the Assign Variable drop down menu and make it equal to “No”. The dialogue box should look like this (below), with = (-) in the space under Expression. 65 29. Click ADD. You should now be returned to the If dialogue box (below). Check that if CASE =1, then SIDS = (+) (under THEN ); and ELSE (ie. CASE=2) that SIDS = (-). 30. Use the TABLES command to check that you have correctly converted CASE into SIDS. 66 CREATING DUMMY X (EXPOSURE) VARIABLES The names of dummy variables created below are in lower case to indicate they are new and not part of the original data set. 31. Now you will recode the X variable BEDSHARE, which also must be converted into 1 (for exposed) or 0 (for unexposed). To do this, you will need to define a new variable called Bedshare_LR and recode it to the appropriate values from the original variable BEDSHARE. 32. From the Variables folder, click Define. The DEFINE dialog box opens. 33. Type in the Variable Name space ‘Bedshare_LR’. Your Dialogue box should look like this (below). 34. Click OK. 35. From the Select/If folder, click If. The IF dialog box opens. 36. From the Available Variables drop-down, select BEDSHARE, and use the buttons to make it equal to 1. Your Dialogue box should look like this (below). 67 37. Click Then. 38. From the Variables folder (on the left of the original screen), click Assign. The ASSIGN dialog box opens. 39. From the Assign Variable drop-down, select ‘Bedshare_LR’, and use the buttons to make it equal to ‘1’. Your Dialogue box should look like this (below). 40. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears. 41. Click ELSE. 42. You are then taken to the Variables folder on the left of the screen. Click on ASSIGN. Choose the variable Bedshare_LR from the Assign Variable drop down menu and make it equal to 0. The dialogue box should look like this (below). 68 43. Click ADD. You should now be returned to the If dialogue box (below). Check that if BEDAHRE=1, then Bedshare_LR=1 (under THEN ); and ELSE (ie. BEDSHARE=2) that Bedshare_LR = 0. 44. Use the TABLES command to check that you have correctly converted BEDSHARE into Bedshare_LR. 69 CREATING THE DUMMY VARIABLES FOR ETHNICITY You will now convert the variable ETHNIC into two dummy variables called Maori and Pacific, which will have the values shown in the table below. For the variable Maori, Maori =1, and Pacific or European = 0. For the variable Pacific, Pacific =1, and Maori or European = 0. There is no variable for European as they are by default the reference, so that the odds ratios calculated for Maori and Pacific will have European infants as the reference group. Existing Variable ETHNIC New Dummy Variables Maori Pacific Maori (=1) 1 0 Pacific (=2) 0 1 European (=3) 0 0 1. To calculate the variable Maori, repeat the steps 14 to 27 above by: d. First using the DEFINE command to define the new variable called Maori ; e. Then, from the variable ETHNIC, use the IF and ASSIGN commands to make Maori infants = 1, and all other infants (Pacific and European) = 0; f. The final IF dialogue box should look like that below. 70 2. To calculate the variable Pacific, repeat the steps 14 to 27 above by: a. First using the DEFINE command to define the new variable called Pacific ; b. Then, from the variable ETHNIC, use the IF and ASSIGN commands to make Pacific infants = 1, and all other infants (Maori and European) = 0; c. The final IF dialogue box should look like that below. 3. Click OK. 4. Use the TABLES command to check that you have correctly recoded ETHNIC into the variables Maori and Pacific. 71 CREATE THE DUMMY VARIABLE FOR MATERNAL SMOKING You will now convert the variable MOTHER_TOBACCO into a dummy variable called Mother_Smk_LR, so that infants of current smokers and occasional smokers are combined into a single group with the value 1, and infants of non-smokers are given the value 0. 1. To calculate the variable Mother_Smk_LR, repeat the steps 14 to 27 above: a. Use the DEFINE command to define the new variable called Mother_Smk_LR ; b. Then, from the variable MOTHER_TOBACCO, use the IF and ASSIGN commands to make infants of current smokers (= 1) and occasional smokers (=3) both equal to 1 for the new variable Mother_Smk_LR, and infants of non-smokers = 0; c. Note: in your first IF dialogue box, make sure your select both MOTHER_TOBACCO = 1 or MOTHER_TOBACCO = 3. Do not use the AND button as no infants will be selected as none of them fulfill this condition (having a mother who is both a current smoker and an occasional smoker); d. The final IF dialogue box should look like that below. 72 B. EXAMPLE OF CONTROLLING FOR CONFOUNDING WITH A CATEGORICAL VARIABLE You are now ready to start logistic regression analyses. You are going to run a model to estimate the risk of cot death associated with bed sharing, adjusting for ethnicity. The general form of logistic regression models is: DISEASE (Y-variable) = EXPOSURE CONFOUNDER (both X-variables) In this example you will run a logistic regression model to calculate the odds ratio of cot death associated with bed sharing, adjusting for ethnicity as a categorical variable. The model is: SIDS = Bedshare_LR Maori Pacific Note: both ethnic variables need to be in the model to ensure the reference group is European. 1. From the Command Tree Advanced Statistics folder, click Logistic Regression. The LOGISTIC dialog box opens. 2. From the Outcome Variable drop-down, select SIDS. 3. From the Other Variables drop-down, select the variables Bedshare_LR, Maori and Pacific. The dialogue should look like this below. 4. Click OK. The results of the logistic regression analysis appear in the Output window 73 At the top, ‘LOGISTIC SIDS = Bedshare_LR Maori Pacific’ specifies the variables in the model. SIDS is the outcome (disease or Y) variable. Bedshare_LR, Maori and Pacific are the exposure (or X) variables. The following information is provided for each of the exposure variables:  odds ratio and 95% Confidence Limits,  calculated beta-coefficient (the antiloge of this coefficient is the odds ratio),  S.E. is the standard error of the beta-coefficient,  and the Z-statistic which is used to derive the p-value for the odds ratio. This is underlined if the pvalue is <0.05, highlighting that it is statistically significant. The output in the row for the CONSTANT term is not important for the purposes of this Module, and can be ignored. The diagram below is a diagrammatic representation of the results from a comparable stratified analysis which may help you interpret the results. The outer box represents the total study population, with the smaller box on the left Maori, and the box on the right are Pacific participants. Those not in any of those boxes are European. The large central box are bed sharers. The numbers in the boxes are the odds of 74 being a case for different combinations of characteristics in the population. If you see more red colour (or dark shade), these people are at higher odds of having cot death. The first odds ratio for bed sharing (adjusted for ethnic group in the Epiinfo output box) is 1.6 (95% CI 1.3 to 2.1). This means that, in this population, Europeans (the reference group for ethnicity) that bed share are about 60% more likely to have a case of cot death if they bed share. The diagram above shows the stratified output, with the equivalent estimate 1.56. The crude odds ratio is 2.13 (see next page), which is the average result over all ethnic groups. What does this suggest about the relationship between ethnic group, bed sharing and cot death? The difference between the crude and adjusted estimate is greater than 10% so ethnic group confounds the relationship between bed sharing and cot death. Adjusting for ethnic group reduces the strength of the effect of the exposure, which is commonly observed when one adjusts for a confounding variable. From the logistic output, the odds ratio for Maori is 3.2. What does this mean? This suggests that for Maori who do not bed share, their excess odds of cot death are 3.2 compared to Europeans who do not bed share (compare to the stratified estimate in the SPAN diagram of 2.4). If they bed share as well, their odds of cot death are multiplied (3.2*1.6=5.12). This is close, to the stratified estimate (Odds ratio 5.53), presented in the above SPAN diagram. This stratified analysis contrasts with the crude odds ratio, shown below, which masks the variation between ethnic groups. 75 76 C. EXAMPLE OF CONTROLLING FOR CONFOUNDING WITH A CONTINUOUS VARIABLE In this model you will run a logistic regression to adjust for age as a continuous variable. The variable you will add to the model is MOTHER_AGE (in years). The model is: SIDS = Bedshare_LR MOTHER_AGE 1. From the Command Tree Advanced Statistics folder, click Logistic Regression. The LOGISTIC dialog box opens. 2. From the Outcome Variable drop-down, select SIDS. 3. From the Other Variables drop-down, select Bedshare_LR and MOTHER_AGE. 4. Click OK. The results of the logistic regression analysis appear in the Output window 77 The odds ratio for Bedshare_LR is now 1.96. The odds ratio for MOTHER_AGE is 0.91, and highly significant (p<0.0001) despite this value being close to 1.00. this is because the value 0.91 is the decrease in the risk of cot death for each one year of increase in the mother’s age. For example, a 5 year increase in age, the odds ratio for cot death is (0.91)5 = 0.62. For a 10 year increase in age, the odds ratio is (0.91)10 = 0.39. 78 Extra for experts (not examinable!) You can see the effect plotted above, so that bed sharing increases the probability of being a case, whereas increasing maternal age reduces the risk of cot death. Notice that a relationship is forced by the model between maternal age and risk of cot death. Generally, it is advisable to check this relationship first, by categorising the independent variable so that the assumption of linearity may be checked. If we combine these two effects into the same graph, and extrapolate the model beyond the observed range of maternal ages, we observe the logistic function that is forced on the effect of maternal age on risk of cot death (by bedsharing status). You can see that bed sharers are at increased risk of cot death for all values of maternal age. The actual observations themselves, are plotted at the top and bottom of the graph. As you can see, there is considerable overlap between the age of cases and of controls, however, controls tend to be slightly older than cases (median 27.8 vs 24.9 years). If a “U” shaped effect were observed on risk of cot death by maternal age, in which mid range values of maternal age were low risk, and extreme values (low or high) we wouldn’t be able to pick it up. The logistic function, fitted to a continuous variable, assumes a dose response effect and also that steepest change in risk of outcome occur in the middle of the range of x-values, with risk plateauing out at the extremes of the range. MOTHER_AGE effect plot bed_share effect plot 0.5 0.26 0.4 0.24 0.3 0.22 0.2 case case 0.2 0.18 0.1 0.16 0.14 0.12 20 30 40 0.0 MOTHER_AGE 0.2 0.4 0.6 bed_share 79 0.8 1.0 1.0 0.6 0.4 0.0 0.2 Probability of case 0.8 Bedshare No Bedsharing 0 10 20 30 40 50 maternal age Despite being a highly significant effect, maternal age does not confound the relationship between bed sharing and cot death. Why? Because the difference between the adjusted effect for bedsharing (1.96) is less than 10% less than the crude odds ratio (2.13). Although maternal age is related to cot death, it is either not related to bed sharing or balanced among bed sharers and non-bed sharers, so it does not exert a confounding influence on the exposure of interest. Before you finish, be certain to save your work by clicking “save” in the program editor. Then click the “text file” button. Then navigate to the file in the computer you want to save, name the file and press “ok”. The file will be saved as a plain text file with a .pgm extension. 80 The next time you want to pick up where you left off, simply open “Analyze data”, click “Open” in the program editor, then click “text file” in the dialogue box, then navigate to the program that you saved at the end of the previous session. Then click “Run” in the program editor and Epiinfo will rerun the commands that you covered in the previous session, opening the original dataset and making the new variables that you’ve created. You may have to wait for a brief period while the program runs. 81 6: Using logistic regression to investigate effect modification D. EXAMPLE OF USING LOGISTIC REGRESSION TO MODEL EFFECT MODIFICATION BY CREATING AN INTERACTION TERM. In Module 4, when you used the TABLES command to calculate an odds ratio for outcome variable CASE in relation to exposure variable BED_SHARE, adjusting for ETHNIC, the very bottom of the output screen showed a p-value = 0.058, indicating that the risk ratios varied between strata (ie. heterogeneity). This is an example of effect-modification, with the ETHNIC variable modifying the risk ratio between CAT and CHD. One way to model effect modification is to multiply two variables (called the main variables) together to get an interaction variable. The model is: SIDS = var A var B (var A)*(var B) where A = Bedshare_LR; B = Mother_Smk_LR You will use a button in the LOGISTIC dialogue box to create the interaction term. The variables Bedshare_LR and Mother_Smk_LR have already been created above. 1. From the Advanced Statistics folder, click Logistic Regression. The LOGISTIC dialog box opens. 2. From the Outcome Variable drop-down, select SIDS. 3. From the Other Variables drop-down, select ‘Bedshare_LR’ and ‘Mother_Smk_LR’. Click on both variable names (while holding down the shift key) so that they are both highlighted in blue. The ‘Make Dummy’ button immediately changes to ‘Make Interaction’. 4. Click on ‘Make Interaction’ button. A new interaction variable (Bedshare_LR*Mother_Smk_LR) appears in the ‘Interaction Terms’ space. Your Dialogue box should look like this (below). 82 5. Click OK. The results of the logistic regression analysis appear in the Output window. The significant odds ratio for the interaction term (OR = 1.74, p=0.029) indicates that a multiplicative interaction between maternal smoking and bed sharing is present. The odds ratios are interpreted as 83 shown in the following table, with the total odds ratio for infants exposed to both risk factors being the product of the main effects for bed sharing and maternal tobacco times the interactive effect. Exposed to Bed Sharing Exposed to Maternal Smoking Yes No Yes 1.35 x 3.05 x 1.74 = 7.16 1.35 No 3.05 1.00 The effect plot is shown below: bed_share*mother_smoke effect plot 0 mother_smoke : 0 mother_smoke : 1 0.4 case 0.3 0.2 0.1 0 1 1 bed_share 84 The effect plot (above) shows the effect of bed sharing, illustrated, by the slope of the line is steeper (stronger effect) in maternal smokers (mother_smoke=1) than non-smokers (mother_smoke=0). The SPAN diagram is shown below, with the red squares having the highest probability (similar to odds) of being a case. The numbers inside the boxes are odds ratios compared to the non-bed sharers and non-mother smokers. The SPAN diagram stratified analysis is very similar to the output of the logistic model. In contrast with the previous analyses (page 59), the odds ratios for ‘Bedshare_LR’ and ‘Mother_Smk_LR’ are weaker because much of their effect has been taken up by the interaction term. 6. Now repeat steps 1 to 5 above, and run a logistic regression model which also includes the ethnic dummy variables Maori and Pacific to the above model. 7. Your Logistic Dialogue box should look like this (below). 85 8. Click OK. The results of the logistic regression analysis appear in the Output window. 86 The p-value for the interaction is term is no longer significant (p=0.0975). However, the odds ratios still have the same pattern as above, as shown in the table below. These odds ratios are very similar to the same analyses you did in Session 4 using the TABLES command to calculate Mantel-Haenszel odds ratios for the effect of bed sharing on cot death, adjusted for ethnicity (page 36). Exposed to Bed Sharing Exposed to Maternal Smoking Yes No Yes 1.27 x 2.67 x 1.53 = 5.19 1.27 No 2.67 1.00 A SPAN diagram illustrates this effect below, with the numbers illustrating odds ratios, that compare with the baseline group (European that neither smoke nor bed share). The increased risk of cot death is illustrated by the deep red colour. 87 When these individual effects are combined, one can see that the overall probability of cot death increases dramatically in the highest risk groups that combine all risk factors. For example, Maori infants, who have mothers that smoke and bed share have a 10 fold increased risk of cot death compared to European infants who do not bed share and whose mother’s do not smoke. The SPAN diagram reports stratified estimates, whereas the equivalent logistic regression odds ratio for Maori infants whose mothers both bed share and smoke is (1.98*1.27*2.67*1.53=10.27). The increased risk associated with Maori ethnic group is not seen in the table above. The effect plot is shown below. On the left you see the probability of being a case, by ethnic group, derived from the logistic model. The red, dashed line shows the 95% confidence interval for the estimate. Clearly Maori are at higher risk of cot death than the other ethnic groups. The narrower confidence interval surrounding the European estimate reflects the larger sample size in this group compared to the other ethnic groups. On the right, the interaction between maternal smoking and bed sharing and the risk of cot death is portrayed. You can see the slope of the line, indicating the effect of bed sharing on cot death is much steeper in smoking mothers than non-smoking mothers. These different gradients (effects) indicate effect modification. 88 eth_cat effect plot bed_share*mother_smoke effect plot 0 mother_smoke : 0 0.3 1 mother_smoke : 1 0.4 0.25 0.35 0.3 0.2 case case 0.25 0.15 0.2 0.15 0.1 0.1 European Maori Pacific 0 eth_cat 1 bed_share Before you go, don’t forget to save your work (page 22)! 89 7: Using logistic regression to examine effect modification (2) Another way to model effect modification is to create dummy variables for each group of exposures when you combine two variables. For example, the variables for maternal smoking (Mother_Smk_LR) and infant bed sharing (Bedshare_LR) can be combined to create 4 levels as shown in the two left hand columns in the table below. Existing Variables Mother_Smk_LR Bedshare_LR Yes (=1) New Dummy Variables Smoke_Share Smoke_only Share_only Yes (=1) 1 0 0 Yes (=1) No (l=0) 0 1 0 No (=0) Yes (=1) 0 0 1 No (=0) No (=0) 0 0 0 Three dummy variables (as shown in the table above) can be created from these 4 levels: Logistic regression then can be used to model effect-modification (or interaction) by running the following model. The model is: SIDS = Smoke_Share Smoke_only Share_only 1. From the Variables folder, click Define. The DEFINE dialog box opens. 2. Type in the Variable Name space ‘Smoke_Share’. Your Dialogue box should look like this (below). 3. Click OK. 4. From the Select/If folder, click If. The IF dialog box opens. 90 5. From the Available Variables drop-down, select Mother_Smk_LR, and use the buttons to make it equal to 1. 6. Click the AND button. 7. From the Available Variables drop-down, select Bedshare_LR, and use the buttons to make it equal to 1. Your Dialogue box should look like this (below). 8. Click Then. 9. From the Variables folder, click Assign (which is highlighted in blue). The ASSIGN dialog box opens. 10. From the Assign Variable drop-down, select Smoke_Share, and use the buttons to make it equal to ‘1’ (for infants exposed to both maternal smoking and bed sharing). Your Dialogue box should look like this (below). 11. Click ADD. The ASSIGN dialogue box closes and the IF dialogue box reappears. 91 12. Click ELSE. 13. You are then taken to the Variables folder on the left of the screen. Click on ASSIGN. Choose the variable Smoke_Share from the Assign Variable drop down menu and make it equal to 0. The dialogue box should look like this (below). 14. Click ADD. You should now be returned to the If dialogue box (below). Check that if both Mother_Smk_LR =1 AND Bedshare_LR=1, then Smoke_Share =1 (under THEN ); and ELSE (ie. all other infants) that Smoke_Share = 0. 92 93 15. Now you will create the dummy variable called Smoke_only for infants exposed only to maternal smoking. To calculate this, repeat the steps 1 to 14 above by: a. First using the DEFINE command to define the new variable called Smoke_only ; b. Then, from the variables Mother_Smk_LR and Bedshare_LR, use the IF and ASSIGN commands to make infants who are exposed only to maternal smoking (and not bedsharing) = 1, and all other infants = 0; c. The final IF dialogue box should look like this below. 16. Now you will create the dummy variable called Share_only for infants exposed only to maternal smoking. To calculate this, repeat the steps 1 to 14 above by: a. First using the DEFINE command to define the new variable called Share_only ; b. Then, from the variables Mother_Smk_LR and Bedshare_LR, use the IF and ASSIGN commands to make infants who are exposed only to bedsharing (and not maternal smoking) = 1, and all other infants = 0; c. The final IF dialogue box should look like that below. 94 17. To check that you have correctly created the new combination variables Smoke_Share_LR, Smoke_only and Share_only, for each of these variables in turn, sort the data set. Then use the LIST command to list each of these 3 new variables with Mother_Smk_LR and Bedshare_LR – to check that the new combination variables are correct. You are now ready to run the logistic regression model with the new dummy variables. All three dummy variables must be included to get appropriate odds ratios. Remember, the model is: SIDS = Smoke_Share Smoke_only Share_only 18. From the Command Tree Advanced Statistics folder, click Logistic Regression. The LOGISTIC dialog box opens. 19. From the Outcome Variable drop-down, select SIDS. 20. From the Other Variables drop-down, select Smoke_Share, Smoke_only and Share_only (see Dialogue box below). 95 21. Click OK. The results of the logistic regression analysis appear in the Output window 96 The odds ratios from this output can be inserted into the table below to help their interpretation. The values in this table are exactly the same as the corresponding table in the previous section (page 59) where you created a multiplicative interaction term by multiplying the maternal smoking and bedsharing variables with each other. Exposed to Bed Sharing Exposed to Maternal Smoking Yes No Yes 7.16 1.35 No 3.05 1.00 22. Now repeat steps 18 to 21 above, and run a logistic regression model which also includes the ethnic dummy variables Maori and Pacific to the above model. 23. Your Logistic Dialogue box should look like this (below). 24. Click OK. The results of the logistic regression analysis appear in the Output window. 97 The odds ratios from this output can be inserted into the table below to help their interpretation. The values in this table are almost the same as the corresponding table in the previous section (page 59) where you created a multiplicative interaction term by multiplying the maternal smoking and bedsharing variables with each other, and also adjusted for ethnicity. These odds ratios also are very similar to the same analyses you did in Session 5 using the TABLES command to calculate Mantel-Haenszel odds ratios adjusted for ethnicity (page 59). 98 Exposed to Bed Sharing Exposed to Maternal Smoking Yes No Yes 5.18 1.27 No 2.67 1.00 The equivalent SPAN diagram, with stratified odds ratios is laid out below. The increased intensity of cases (red colour) is illustrated in smokers who identify as Maori. Don’t forget to save your work (page 22)! References 1 Scragg R, Mitchell EA, Taylor BJ, Stewart AW, Ford RP, Thompson JM, Allen EM, Becroft DM: Bed sharing, smoking, and alcohol in the sudden infant death syndrome. New zealand cot death study group. BMJ 1993;307:1312-1318. 99 2 Marshall RJ: Scaled rectangle diagrams can be used to visualize clinical and epidemiological data. J Clin Epidemiol 2005;58:974-981. 3 Chongsuvivatwong V: Analysis of epidemiological data using r and epicalc. 4 Jennings LC, MacDiarmid RD, Miles JAR: A study of acute respiratory disease in the community of port chalmers. I. Illnesses within a group of selected families and the relative incidence of respiratory pathogens in the whole community. Journal of Hygiene 1978;81:49-66. 100 8: Sample size using Statcalc Today we will be using Statcalc, an Epi info utility, to estimate the sample size for a study that is planned. We will be going through a real scenario. You have recently got a job in the university as a research fellow on a project looking at diabetes prevalence in New Zealand. Your funding for next year looks like it may dry up and you could be facing dreaded unemployment. The Health Research Council puts out a “request for proposals” on research which may prevent H1N1 infection. $250,000 is to be made available. Another Professor in the department suggests that he has an idea of looking at the effect of a large one off dose of vitamin D on the incidence of upper respiratory infection in a randomized study. He gives you some help, but you have the task of sorting out the nuts and bolts of the grant application and designing the study. Background Interventions to reduce the burden of infectious disease usually use the triad of the host, agent and environment as a theoretical model to structure interventions. In the host, biological protection against infection consists of both innate and agent specific or humoral immunity. In H1N1 infection, national pandemic preparedness plans emphasise the rapid deployment of both antiviral treatment and specific vaccination that provokes a response from the humoral immune system. Both interventions are limited both by expense and possible lack of effect. For example, resistance to oseltamivir may develop due to mutation and widespread use, or a vaccination may be slow to develop and test for clinical efficacy because novel antigens may be needed for its manufacture. The impact of H1N1 on a population may be severe before these issues are identified and useful alternatives developed. Interventions to enhance innate immunity, which obviate limitations of antiviral therapy and vaccination, have not been considered. To explore how vitamin D may reduce the impact of H1N1 infection, first we define the infections we seek to measure, the status of vitamin D in New Zealand populations, then explore the evidence that links vitamin D with the immune response to such infections. Study design A randomised, double blind, controlled trial is proposed. The two arms will be one of either vitamin D supplementation (500,000 IU) or placebo and participants will be recruited at the time of their annual influenza vaccination from primary care. Such a dose has been shown to safely raise 25OHD levels to ≥ 80nmol/L for at least three months, without inducing hypercalcemia. Although in the real example, we considered the counts of respiratory infections, here we will consider the outcome as a binary variable for simplicity (infection or no infection after one year). How many participants are needed in your study? As part of the grant writing exercise, you need to consider how many participants you will recruit. A sample size calculation is required. Let’s quickly review the rationale for a sample size calculation. After conducting your experiment, you can make one of four conclusions. The two boxes in the following table labeled “OK” are where you want to end up. You want to minimize the chance of ending up having too few participants so that you may accept the null hypothesis even when it is false (type 2 error) or having too many participants, so that you will reject the null hypothesis when it is in fact true (type 1 error). 101 Test result Null Hypothesis Accept Null Reject Null True OK Type 1 error False Type 2 error OK (No difference) How do you go about the calculation? You look up in the textbook: and: • n=sample in each group • π0=risk in unexposed • π1=risk in unexposed • u=1 sided AUC of normal dist. Corresponding to 100%-power (eg. 10%; u=1.28) • v=2 sided z level corresponding to % of AUC of normal dist for two sided significance level required (5%, v=1.96) That looks ugly! Fortunately, you’re not the first person to undertake a sample size calculation. Others have decided to write computer programs to make life easier for you. The statcalc utility has just such a feature! What information is needed? From above, two bits of information are “no brainers”. It is standard practice to accept the 5% (1/20) level for the probability of a type 1 error (v). Whereas the probability of a type 2 error is usually fixed at 20% (u). All we are left with is the risk in the exposed and unexposed. In a community survey completed about 30 years ago, you find that the mean rate of infections is 2 per person per year.[4] You then wonder what proportion of individuals are likely to have no infections and how many would 102 have at least one? You go down the corridor and ask a statistician for help. He (or she) says that if we assume this number comes from a Poisson distribution (commonly used for count data), (figure 1) then we expect the relative number of people with no infections to be about 14%, so that 86% (10014) of people will have at least one infection during the year. 103 0.15 0.10 0.00 0.05 Probability Mass 0.20 0.25 Poisson Distribution: Mean = 2 0 2 4 6 8 Number of infections Figure 1. Poisson distribution, mean = 2 infections. If we believe that our treatment is likely to reduce the number of infections by about 20%, reducing the population mean to 1.6 infections, then our statistician informs us that about 80% (figure 2) of the treatment group will have a respiratory infection. Great, we now have all the information to go ahead and make our sample size calculation. 104 0.20 0.15 0.00 0.05 0.10 Probability Mass 0.25 0.30 Poisson Distribution: Mean = 1.6 0 2 4 Number of infections To do so, open Epi info. In the top menu, click Utilities, then Statcalc. 105 6 8 You should get this. Use the arrow keys to select “Sample size & power”. Then select “Cohort or Cross-sectional” Press Enter 106 You will then see this: Here, you are given all the inputs required for estimating sample size for your grant application. You can leave the first three settings and scroll to the third number using the arrow keys. We have already worked out that the frequency of disease in the unexposed group is probably about 86%. We think that the exposed will have a frequency of disease in the exposed of about 80% enter this on the last line. Press Enter when you are finished. Your screen should look something like this. Press F4 to complete the calculation. Reading along the top line you see that you will need a total of 1294 people. 107 You present these findings to your boss. He says you must be joking if you’re going to recruit that number of people in one year at the start of winter with the budget being offered. He suggests redoing the calculation with a 40% reduction in respiratory infections. This time, you think your treatment will reduce the number of infections to an annual average of 1.2/per person (perhaps a little optimistic, but you have to get the numbers down). Your faithful statistician calculates that the number of people expected with at least one infection is now 70%. See below: 108 0.2 0.0 0.1 Probability Mass 0.3 Poisson Distribution: Mean =1.2 0 2 4 Number of infections Repeat the calculation in Statcalc. 109 6 8 Now you need only 232 participants. Your boss now informs you that the sample size is a bit more realistic, but will the reviewers of your grant believe that vitamin D will have such a dramatic effect?? You hope so! The graph below illustrates Power vs sample size for a variety of different event rates in the exposed (treated with vitamin D) group. 110 I hope you’ve been able to see from this exercise that sample size calculation is a compromise between a number of factors including resources, data available and various mathematical assumptions. It is a useful process to go through, because at the end, you understand well what effect you will be looking for, and how your study is limited by design and resource. References 1. Jennings LC, MacDiarmid RD, Miles JAR. A study of acute respiratory disease in the community of Port Chalmers. I. Illnesses within a group of selected families and the relative incidence of respiratory pathogens in the whole community. Journal of Hygiene. 1978;81:49-66. 111

2: Continuing with unadjusted Effect Estimates

Related documents

Products

Support

2: Continuing with unadjusted Effect Estimates

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib