7 - Discrete Random Variables This is an empirical discrete probability distribution for the X = # of pigs in a litter A discrete probability density function (pdf) is defined as f ( x) P( X x) . The Prob column contains estimated values for the probability function. If this study were repeated with a different sample n = 378 litters these estimates would be different. Binomial and Poisson Distributions Binomial Distribution and Random Variable A binomial random variable X is defined to the number of “successes” in n independent trials where the P(success) = p is constant. In the definition above notice the following conditions need to be satisfied for a binomial experiment: 1. The is a fixed number of n trials carried out. 2. The outcome of a given trial is either a “success” or “failure”. 3. The probability of success (p) remains constant from trial to trial. P( success ) p and P( failure ) 1 p q 4. The trials are independent, the outcome of a trial is not affected by the outcome of any trial. Binomial Probability Function n n! f ( x) P( X x) p x q n x p x q n x , x 0,1,..., n x!(n x)! x the coefficient in front denotes the number of ways to obtain x successes in n trials. Example 1: A drug company claims that 10% of patients taking the drug will experience adverse side effects. To test this claim a researcher administers the drug to a random sample of 20 patients. Let X = the number patients in our sample who experience adverse side effects. a) What is the probability that exactly 4 patients experience side effects? b) What is the probability that 2 patients or less experience side effects? 43 c) What is the probability that 5 patients of less experience side effects? d) What is the probability that 6 or more patients experience side effects? e) Suppose that in our sample 6 patients experience side effects. What do think about the claim made by the drug company on the basis of this result? For a binomial random variable X the mean, variance and standard deviation of the number of successes are given by: 2 f) Suppose we gave the drug to 100 patients. If the drug company’s claim regarding side effects is true, how many side effects do we expect to observe, i.e. what is the mean or expected value of X? g) What is the standard deviation of the number of side effects? 44 When n is sufficiently large then X = # of “successes” in the n trials is approximately normally distributed. As an example consider the histogram below which shows the simulated results of 10,000 clinical trials where for each trial n = 100 patients are given a drug which has p P( side effect) .10 or a 10% chance of causing a side effect and the number of patients experiencing side effects is observed. h) How many side effects would you have to observe to be convinced that the 10% side effect is wrong and that the true side effect rate is greater? Binomial Distribution Table for n = 100 and p = .10 45 Example 2: Effect of Togetherness on Heart Rates of Rats A researcher was interested in determining if the heart rate of rats increases when they are in a cage with other rats versus when they are in a cage by themselves. The researcher thought this would be the case but wanted to conduct a study to determine if this hypothesis was supported empirically. The results were obtained: Rat Alone Together Difference Sign of Difference (A) (T) =T-A 1 463 523 2 462 494 3 462 461 4 456 535 79 5 450 476 26 6 426 454 28 7 418 448 30 8 415 408 -7 9 409 470 61 10 402 437 35 What did we conclude regarding the research hypothesis? 46 Poisson Distribution and Random Variable A Poisson random variable X = # of occurrences in specified time or space unit. The assumptions we make about the process generating X are as follows: 1. Occurences are independent 2. Any number of occurrences is possible in a given time/space unit. 3. Probability of single occurrence in a given interval is proportional to the length/size of the interval. 4. No simultaneous occurrences. 5. The expected number of occurrences during any one space/time unit is denoted by . This is the same for all space/time units. Another case where the Poisson distribution is used to model the number of occurrences is when we in a binomial experiment situation where the probability of success (p) is small and the number trials (n) is big. In this case we can treat X = # of successes in n trials as having a Poisson distribution with np . Poisson Probability Function e x f ( x) P( X x) , x 0,1,2,... x! Example 1: Sampling Organisms from a Pond Suppose that the average number of particular organism is expected to be 4 per 1 ml sample, i.e. 4/ml . a) What is the probability of observing 2 organisms in a 1 ml sample? b) What is the probability of observing fewer than 3 organisms? c) What is the probability of observing more than 5 organisms? d) Suppose a feedlot began operation near the pond. Two months after it first began operation a sample a 1-ml sample of water was taken and 10 organisms were found. What do you conclude on the basis of this result? 47 Example 2: Birth Defects in the Counties Surrounding a Nuclear Power-Plant in Handford, Washington. One of the important issues in assessing nuclear energy is whether there are excess disease risks in the communities surrounding nuclear power plants. A study was undertaken in the community surrounding Hanford, Washington, looking at the prevalence of selected congenital malformations in the counties surrounding the nucleartest facility in Hanford. a) In a study conducted by Sever et al. (1988), 27 cases of Down’s syndrome were found and only 19 were expected based on the Birth Defects Monitoring Program prevalence estimates conducted in the states of Washington, Idaho, and Oregon. Is there are a significant excess in the number of cases in the area surrounding the nuclear-power plant? b) Suppose that 12 cases of cleft palate are observed, while only 7 are expected based on Birth Defects Monitoring Program estimates. Does this represent a significant excess in the number children born with cleft palates? 48 Binomial Table Generator in JMP To use the Binomial Table Generator file in JMP you simply need to change the number of trials (n) and the probability of success (p) which is labeled a p in the table below. To change the n and p values right-click at the top of the column and change the number in the formula to your desired values. The table will then automatically update, and give the probabilities in the last four columns. Poisson Table Generator in JMP To use the Poisson Table Generator file in JMP you simply need to change the mean rate of arrival or occurrence () which is labeled a mu in the table below. To change the value right-click at the top of the column and change the number in the formula to your desired value. The table will then automatically update, and give the probabilities in the last four columns. 49 Additional Examples – 1 In the U.S. in 2007 7.6% of infants born had birth weights classified as low (i.e. weight < 2,500 g). In a sample of n = 123 infants born to women who smoked during pregnancy it was found that 14 had low birth weights (< 2500 g). Does this provide evidence that the percentage of infant born with low birth weights to women who smoked during pregnancy exceeds the national rate? 2 – A study in Woburn, MA, in the 1970’s looked at possible excess cancer risk in children, with a particular focus on leukemia. This study was later portrayed in the book and movie titled A Civil Action. An important environmental issue in the investigation concerned the possible contamination of the town’s water supply. Specifically, 12 cases of childhood leukemia were diagnosed in Woburn during the 1970’s (Jan. 1st 1970 – Dec. 31st, 1979). A key statistical issue is whether this represents an excessive number of leukemia cases, assuming that Woburn has had a constant 12,000 child residents during this period and that the incidence rate of leukemia in children nationally is 5 cases per 100,000 person-years. Can we conclude that there is a significant excess in the number of childhood leukemia cases in Woburn, MA in the 1970’s. 50