Section 5

advertisement
STAT 405 - BIOSTATISTICS
Handout 5 – Methods for a Single Categorical Variable, Part III
EXAMPLE 1: Occupational Health
Many studies have looked at possible health hazards faced by rubber workers. In one such
study, a group of 8418 white male workers aged 40-84 (either active or retired) on January 1,
1964, were followed for 10 years for various mortality outcomes. Their mortality rates were
then compared with U.S. white male mortality rates in 1968. In one of the reported findings,
4 deaths due to Hodgkin’s disease were observed compared to 3.3 deaths expected from U.S.
mortality rates. Is this difference significant?
Question: Is the binomial probability model applicable in this situation? Explain.
THE POISSON PROBABILITY MODEL
The testing procedures discussed can be modified to use the Poisson distribution instead of the
binomial. Let the random variable X represent the number of times an event occurs in a given
time or area. The Poisson distribution can be used whenever the following assumptions are
met:
1. The probability the event occurs in a given unit of time or area is the same for all units.
2. The number of events that occur in one unit of time or area is independent of the
number occurring in other units.
The probability of k events occurring in a given time period or area for the Poisson random
variable with mean µ is given by the Poisson pdf:
P(X  k) 
e μ μ k
for k  0, 1,2...
k!
where µ = the expected number of events in a certain time period or area. The Poisson can also
be used to approximate binomial probabilities when n is “large” and p is “small” using np as
the mean rate of occurence.
EXAMPLE 2: Environmental Health, Obstetrics (Exercise 4.33, pg. 101 )
Suppose that the rate of major congenital malformations in the general population is 2.5
malformations per 100 deliveries. A study is set up to investigate whether offspring of
mothers who used marijuana during pregnancy have a higher rate of congenital
malformations. The researchers found of 75 offspring of mothers who used marijuana, 8
have a major congenital malformation. Is this evidence of excess risk of malformations in
this group of mothers?
1
Questions:
1.
Is the binomial probability model applicable in this situation? Explain.
2. What is the exact probability of 8 malformations occurring in a sample of 75 offspring
born to mothers who smoked marijuana during pregnancy?
Using the binomial distribution with n = 75 and p = .025:
data BinomialProbabilities;
prob = pdf('Binomial', 8, .025, 75);
proc print noobs; run;
In R:
> dbinom(8,size=75,prob=.025)
[1] 0.0004720321
Using the Poisson distribution with  why?
data PoissonProbabilities;
prob = pdf('Poisson', 8, 1.875);
proc print data=PoissonProbabilities noobs;
run;
In R:
> dpois(8,lambda=1.875)
[1] 0.0005810152
3. Is the occurrence of 8 congenital malformations if the rate for this subpopulation of
infants is the same as that for the general population? To answer this question, find the
probability of observing 8 or more malformations.
Using binomial with n = 75 and p = .025
data BinomialProbabilities;
prob = 1 - cdf('Binomial', 7, .025, 75);
proc print noobs; run;
In R:
Binomial
> 1 - pbinom(7,size=75,prob=.025)
[1] 0.0005800538
2
Using Poisson with congenital malformations per 75 infants
data PoissonProbabilities;
prob = 1-cdf('Poisson', 7, 1.875);
proc print data=PoissonProbabilities noobs;
run;
In R:
> 1 - ppois(7,lambda=1.875)
[1] 0.0007293296
Poisson
EXAMPLE 1: Occupational Health
Let’s go back to the Example 1 dealing with health hazards faced by rubber workers. Recall that
4 deaths due to Hodgkin’s disease were observed compared with 3.3 deaths expected from U.S.
mortality rates. Is there an excess of Hodgkin’s disease in this population?
To answer this question, we will carry out a hypothesis test based on exact Poisson probabilities.
Step 0:
Check assumptions. For this test, you must check whether the Poisson
distribution is appropriate for the problem.
Step 1:
Set up your null and alternative hypotheses.
Ho:
Ha:
Step 2:
Set α = .05.
Step 3:
Use the Poisson distribution to find the exact p-value.
The following graphic shows the Poisson distribution for the number of deaths
due to Hodgkin’s disease over the 10 years, assuming the null hypothesis is true
(µ = 3.3):
3
Recall that the p-value is the probability of observing a sample AT LEAST AS
EXTREME as our data, assuming the null hypothesis is true. For this example,
“at least as extreme” implies observing 4 or more deaths.
We can use R to find the probability of 4 or more deaths due to Hodgkin’s
disease:
> 1 - ppois(3,lambda=3.3)
[1] 0.4196618
4
Step 4:
Make a decision concerning the null hypothesis and write a conclusion in the
context of the original problem.
EXAMPLE 3: Occupational Health
In the rubber-worker data, there were 21 bladder cancer deaths and an expected number of
events from general population cancer mortality rates of 18.1. Is there evidence for either a
significant excess or deficit of bladder cancer cases?
Step 0:
Check assumptions. For this test, you must check whether the Poisson
distribution is appropriate for the problem.
Step 1:
Set up your null and alternative hypotheses.
Ho:
Ha:
Step 2:
Set α = .05.
Step 3:
Use the Poisson distribution to find the exact p-value.
Step 4:
Make a conclusion concerning the null hypothesis and write a conclusion
in the context of the original problem
5
Confidence Intervals Based on the Poisson Distribution
You can use the exact method discussed in Section 6.9 of your text:
An exact (1-α)100% confidence interval for the Poisson parameter µ is given by (µ 1, µ 2),
where µ 1 and µ 2 satisfy these equations:
P(X ≥ x | µ = µ 1) = α/2
P(X ≤ x | µ = µ 2) = α/2
EXAMPLE: Once again, let’s consider the example where 4 deaths due to Hodgkin’s disease
were observed compared with 3.3 deaths expected from U.S. mortality rates. To calculate the
confidence interval in SAS, you can use the following program:
data PoissonExactCIs;
x = 4;
*Input this--the observed number of events;
do i = 0 to 20 by .1;
lower = 1-cdf('Poisson', x-1, i);
upper = cdf('Poisson', x, i);
output;
end;
proc print noobs data=PoissonExactCIs; run;
Usually with exact confidence intervals, we cannot exactly satisfy α/2 in each tail. Instead, we
use a more conservative approach:

Find the largest value of µ 1 so that P(X ≥ x | µ = µ1) ≤ α/2

Find the smallest value of µ 2 so that P(X ≤ x | µ = µ 2) ≤ α/2
We can also do one-sided CI’s for  as with the binomial probability of success by using  in
place of  and using either the lower or upper bound formulas.
6
Using these guidelines and the above SAS output, write the 95% confidence interval for µ = the
expected number of deaths due to Hodgkin’s disease in the 10-year period for the rubber
workers:
…
Poisson Exact Test and CI in R
Example 1: Hodgkin’s Disease
> poisson.test(4,r=3.3)
Exact Poisson test
data: 4 time base: 1
number of events = 4, time base = 1, p-value = 0.5783
alternative hypothesis: true event rate is not equal to 3.3
95 percent confidence interval:
1.089865 10.241589
sample estimates:
event rate
4
7
Example 2: Environmental Health, Obstetrics
> poisson.test(8,r=1.875)
Exact Poisson test
data: 8 time base: 1
number of events = 8, time base = 1, p-value = 0.0007293
alternative hypothesis: true event rate is not equal to 1.875
95 percent confidence interval:
3.453832 15.763189
sample estimates:
event rate
8
8
Download