STAT 405 - BIOSTATISTICS Handout 5 – Methods for a Single Categorical Variable, Part III EXAMPLE 1: Occupational Health Many studies have looked at possible health hazards faced by rubber workers. In one such study, a group of 8418 white male workers aged 40-84 (either active or retired) on January 1, 1964, were followed for 10 years for various mortality outcomes. Their mortality rates were then compared with U.S. white male mortality rates in 1968. In one of the reported findings, 4 deaths due to Hodgkin’s disease were observed compared to 3.3 deaths expected from U.S. mortality rates. Is this difference significant? Question: Is the binomial probability model applicable in this situation? Explain. THE POISSON PROBABILITY MODEL The testing procedures discussed can be modified to use the Poisson distribution instead of the binomial. Let the random variable X represent the number of times an event occurs in a given time or area. The Poisson distribution can be used whenever the following assumptions are met: 1. The probability the event occurs in a given unit of time or area is the same for all units. 2. The number of events that occur in one unit of time or area is independent of the number occurring in other units. The probability of k events occurring in a given time period or area for the Poisson random variable with mean µ is given by the Poisson pdf: P(X k) e μ μ k for k 0, 1,2... k! where µ = the expected number of events in a certain time period or area. The Poisson can also be used to approximate binomial probabilities when n is “large” and p is “small” using np as the mean rate of occurence. EXAMPLE 2: Environmental Health, Obstetrics (Exercise 4.33, pg. 101 ) Suppose that the rate of major congenital malformations in the general population is 2.5 malformations per 100 deliveries. A study is set up to investigate whether offspring of mothers who used marijuana during pregnancy have a higher rate of congenital malformations. The researchers found of 75 offspring of mothers who used marijuana, 8 have a major congenital malformation. Is this evidence of excess risk of malformations in this group of mothers? 1 Questions: 1. Is the binomial probability model applicable in this situation? Explain. 2. What is the exact probability of 8 malformations occurring in a sample of 75 offspring born to mothers who smoked marijuana during pregnancy? Using the binomial distribution with n = 75 and p = .025: data BinomialProbabilities; prob = pdf('Binomial', 8, .025, 75); proc print noobs; run; In R: > dbinom(8,size=75,prob=.025) [1] 0.0004720321 Using the Poisson distribution with why? data PoissonProbabilities; prob = pdf('Poisson', 8, 1.875); proc print data=PoissonProbabilities noobs; run; In R: > dpois(8,lambda=1.875) [1] 0.0005810152 3. Is the occurrence of 8 congenital malformations if the rate for this subpopulation of infants is the same as that for the general population? To answer this question, find the probability of observing 8 or more malformations. Using binomial with n = 75 and p = .025 data BinomialProbabilities; prob = 1 - cdf('Binomial', 7, .025, 75); proc print noobs; run; In R: Binomial > 1 - pbinom(7,size=75,prob=.025) [1] 0.0005800538 2 Using Poisson with congenital malformations per 75 infants data PoissonProbabilities; prob = 1-cdf('Poisson', 7, 1.875); proc print data=PoissonProbabilities noobs; run; In R: > 1 - ppois(7,lambda=1.875) [1] 0.0007293296 Poisson EXAMPLE 1: Occupational Health Let’s go back to the Example 1 dealing with health hazards faced by rubber workers. Recall that 4 deaths due to Hodgkin’s disease were observed compared with 3.3 deaths expected from U.S. mortality rates. Is there an excess of Hodgkin’s disease in this population? To answer this question, we will carry out a hypothesis test based on exact Poisson probabilities. Step 0: Check assumptions. For this test, you must check whether the Poisson distribution is appropriate for the problem. Step 1: Set up your null and alternative hypotheses. Ho: Ha: Step 2: Set α = .05. Step 3: Use the Poisson distribution to find the exact p-value. The following graphic shows the Poisson distribution for the number of deaths due to Hodgkin’s disease over the 10 years, assuming the null hypothesis is true (µ = 3.3): 3 Recall that the p-value is the probability of observing a sample AT LEAST AS EXTREME as our data, assuming the null hypothesis is true. For this example, “at least as extreme” implies observing 4 or more deaths. We can use R to find the probability of 4 or more deaths due to Hodgkin’s disease: > 1 - ppois(3,lambda=3.3) [1] 0.4196618 4 Step 4: Make a decision concerning the null hypothesis and write a conclusion in the context of the original problem. EXAMPLE 3: Occupational Health In the rubber-worker data, there were 21 bladder cancer deaths and an expected number of events from general population cancer mortality rates of 18.1. Is there evidence for either a significant excess or deficit of bladder cancer cases? Step 0: Check assumptions. For this test, you must check whether the Poisson distribution is appropriate for the problem. Step 1: Set up your null and alternative hypotheses. Ho: Ha: Step 2: Set α = .05. Step 3: Use the Poisson distribution to find the exact p-value. Step 4: Make a conclusion concerning the null hypothesis and write a conclusion in the context of the original problem 5 Confidence Intervals Based on the Poisson Distribution You can use the exact method discussed in Section 6.9 of your text: An exact (1-α)100% confidence interval for the Poisson parameter µ is given by (µ 1, µ 2), where µ 1 and µ 2 satisfy these equations: P(X ≥ x | µ = µ 1) = α/2 P(X ≤ x | µ = µ 2) = α/2 EXAMPLE: Once again, let’s consider the example where 4 deaths due to Hodgkin’s disease were observed compared with 3.3 deaths expected from U.S. mortality rates. To calculate the confidence interval in SAS, you can use the following program: data PoissonExactCIs; x = 4; *Input this--the observed number of events; do i = 0 to 20 by .1; lower = 1-cdf('Poisson', x-1, i); upper = cdf('Poisson', x, i); output; end; proc print noobs data=PoissonExactCIs; run; Usually with exact confidence intervals, we cannot exactly satisfy α/2 in each tail. Instead, we use a more conservative approach: Find the largest value of µ 1 so that P(X ≥ x | µ = µ1) ≤ α/2 Find the smallest value of µ 2 so that P(X ≤ x | µ = µ 2) ≤ α/2 We can also do one-sided CI’s for as with the binomial probability of success by using in place of and using either the lower or upper bound formulas. 6 Using these guidelines and the above SAS output, write the 95% confidence interval for µ = the expected number of deaths due to Hodgkin’s disease in the 10-year period for the rubber workers: … Poisson Exact Test and CI in R Example 1: Hodgkin’s Disease > poisson.test(4,r=3.3) Exact Poisson test data: 4 time base: 1 number of events = 4, time base = 1, p-value = 0.5783 alternative hypothesis: true event rate is not equal to 3.3 95 percent confidence interval: 1.089865 10.241589 sample estimates: event rate 4 7 Example 2: Environmental Health, Obstetrics > poisson.test(8,r=1.875) Exact Poisson test data: 8 time base: 1 number of events = 8, time base = 1, p-value = 0.0007293 alternative hypothesis: true event rate is not equal to 1.875 95 percent confidence interval: 3.453832 15.763189 sample estimates: event rate 8 8