MATH 233 Chapter 4 Probability Distributions of Discrete Variables _____________________________________________________________ Random Variables Definition: A random variable is a function that assigns a numeric value to each event in a sample space. Example: Coin Toss Event Head Tail X 0 1 Definition: The probability distribution of a discrete random variable is a table used to specify all possible values of a discrete random variable along with their respective probabilities. If x1, x2, x3, . . . , xk are all possible values of the discrete random variable X, then we may then give the following two essential properties of a probability distribution of a discrete variable: (1) (2) 0 P X x 1 P X x 1, for all x. Practice Exercise 1: Explain why each of the following distributions is or is not a probability distribution: 1 Mean and Variance of Discrete Probability Distributions: The formulas are below. The standard deviation σ is the square root of the variance. The mean can be written as μ, the subscript x is not required. 2 Practice Exercise 2: Let x be a discrete random variable taking values 0, 1, 2, 3, 4 or 5. x P X x 0 1 2 3 4 5 Total 0.75 0.05 0.04 0.02 0.01 a) Fill in the table so that it is a probability distribution. b) Determine the mean, standard deviation and variance of the distribution. c) Compute the following probabilities P X 3 , P X 2 , P X 4 , P X 4 , P 1 X 4 , P X 0 3 The Binomial Distribution When a random process or experiment, called a trial, can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, full-term or premature, the trial is called a Bernoulli trial. The Bernoulli Process A sequence of Bernoulli trials forms a Bernoulli process under the following conditions. 1. Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure. 2. The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure (1 – p), is denoted by q. 3. The trials are independent; that is, the outcome of any particular trial is not affected by the outcome of any other trial. The binomial distribution has two parameters, n and p. Where n represents the number of independent Bernoulli trials and p is the probability of success. The probability of failure is denoted by q = (1 – p). The binomial probability for x successes in n trial is given as, P X x n! p x qn x n Cx p x q n x x ! n x ! for x 0,1,2, ,n When you use an uppercase symbol (X), it is a random variable and when you use lower case (x) it is a value. The mean, variance and standard deviation of the binomial distribution are np , 2 np 1 p npq and npq respectively. TI Calculator Commands: X is a discrete random variable following Binomial distribution with parameters n and p. Press 2nd VARS binompdf(n, p, x) for exact probability 4 binomcdf(n, p, x) for cumulative probability Exercise A: 32% of people in a certain town have diabetes. A sample of 15 people are taken. Let X be the number in the sample with diabetes. Find the following probabilities: a) b) c) d) e) The probability The probability The probability P(5<X<10) The probability that exactly 3 people in the sample have diabetes. that less than 5 have diabetes that between 5 and 9 (inclusive) have diabetes that at least 6 have diabetes 5 Exercise B: Suppose it is known that 10 percent of a certain population is color blind. If a random sample of 25 people is drawn from this population, find the probability that: a) Three or fewer will be color blind. b) Four or more will be color blind. c) Between three and six inclusive will be color blind. d) Two, three, or four will be color blind. 6 Practice Exercise 3: Allergic rhinitis is an inflammation of the nasal airways that occurs when an allergen, such as pollen or dust is inhaled by an individual. It is known that 12% of the members of a certain population with a sensitized immune system are highly susceptible to have Allergic rhinitis. A random sample of n = 15 subjects is drawn from this population. What is the probability that, a) exactly 3 will have Allergic rhinitis. Binomial with n = 15 and p = 0.12 x 3 P( X = x ) 0.1695 b) between zero and three, inclusive, will have Allergic rhinitis. Binomial with n = 15 and p = 0.12 x 3 P( X <= x ) 0.9041 c) determine the mean and standard deviation. np npq 7 Practice Exercise 4: Antibiotic resistance occurs when disease-causing microbes become resistant to antibiotic drug therapy. Because this resistance is typically genetic and transferred to the next generation of microbes, it is a very serious public health problem. According the Centers treated in 2004 were prescribed ciprofloxacin week in 2004. (Source: By Baldi, and Moore) for Disease Control (CDC), 7% of gonorrhea cases resistant to the antibiotic ciprofloxacin. A physician for the treatment of 10 cases of gonorrhea during one The Practice of Statistics in the Life Sciences, 2nd Ed. a) What is the distribution of the cases resistant to ciprofloxacin? b) What is the probability that exactly 1 out of the 10 cases was resistant to ciprofloxacin? What is the probability for exactly 2 out of 10? c) What is the probability that at most 3 out of the 10 cases were resistant to ciprofloxacin? d) What is the probability that 1 or more of the 10 cases were resistant to ciprofloxacin? (Hint: It is easier to first find the probability that exactly 0 of the 10 cases were resistant.) e) What is the mean number of gonorrhea cases that are resistant to the antibiotic ciprofloxacin out of 10 cases? What is the standard deviation the count of antibiotic-resistant cases? of Solution: 8 Practice Exercise 5: The probability is 0.314 that the gestation period of a woman will exceed 9 months. In six human births, what is the probability that the number in which the gestation period exceed 9 months is a) b) c) d) exactly three? exactly five? at least five? between three and five, inclusive? 9 The Poisson Distribution If x is the number of independent occurrences of some random event in an interval of time or space, the probability that x will occur is given by f x P X x e x x! for x 0,1,2, The Greek letter (lambda) is called the parameter of the distribution and is the average number of occurrences of the random event in the interval. The symbol e is the constant (to four decimals) 2.7183. The mean, variance and standard deviation of the Poisson distribution 2 are , and respectively. TI Calculator Commands: X is a discrete random variable following Poisson distribution with parameter λ. Press 2nd VARS poissonpdf(µ, x) for exact probability poissoncdf(µ, x) for cumulative probability Example C: In a study of drug-induced anaphylaxis among patients taking rocuronium bromide as part of their anesthesia, researchers found that the occurrence of anaphylaxis followed a Poisson model with λ = 12 incidents per year in Norway. Find the probability that in the next year, among patients receiving rocuronium, exactly three will experience anaphylaxis. 10 Example D Refer to Example C. What is the probability that at least three patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia? 11 Exercise E: Researchers looked at the occurrence of retinal capillary hemangioma (RCH) in patients with von Hippel–Lindau (VHL) disease. RCH is a benign vascular tumor of the retina. Using a retrospective consecutive case series review, the researchers found that the number of RCH tumor incidents followed a Poisson distribution with λ = 4 tumors per eye for patients with VHL. Using this model, find the probability that in a randomly selected patient with VHL: a) There are exactly five occurrences of tumors per eye. b) There are more than five occurrences of tumors per eye. c) There are fewer than five occurrences of tumors per eye. d) There are between five and seven occurrences of tumors per eye, inclusive. 12 Exercise F: In a certain population an average of 13 new cases of esophageal cancer are diagnosed each year. If the annual incidence of esophageal cancer follows a Poisson distribution, find the probability that in a given year the number of newly diagnosed cases of esophageal cancer will be: a) Exactly 10 b) At least three c) No more than three d) Between 12 and 15, inclusive e) Fewer than four 13 Exercise G: In a study of the relationship between measles vaccination and Guillain-Barré syndrome (GBS), Silveira et al., used a Poisson model in the examination of the occurrence of GBS during latent periods after vaccinations. They conducted their study in Argentina, Brazil, Chile, and Colombia. They found that during the latent period, the rate of GBS was λ = 1.3 cases per day. Using this estimate, find the probability on a given day of: a) No cases of GBS b) At least one case of GBS c) Fewer than five cases of GBS 14 Practice Exercise 6: The number of cases of tetanus reported in Canada during 2011 has a Poisson distribution with parameter λ = 4. a) What is the probability that exactly three cases of tetanus will be reported during a given year? Probability Density Function Poisson with mean = 4 x 3 P( X = x ) 0.195367 P(X = 3) = 0.1953 b) What is the probability that four or more cases of tetanus will be reported? Cumulative Distribution Function Poisson with mean = 4 x 3 P( X <= x ) 0.433470 P(X ≥ 4) = 1 – P(X ≤ 3) = 1 – 0.4335 = 0.5665 TEST 1 COVERS UP TO HERE 15 Chapter 5 Probability Distributions of Continuous Variables Continuous Probability Distributions The probability distributions considered thus far, the binomial and the Poisson are distributions of discrete variables. Let us now consider distributions of continuous random variables. A continuous variable is one that can assume any value within a specified interval of values assumed by the variable. Consequently, between any two values assumed by a continuous variable, there exist an infinite number of values. If a continuous random variable has a distribution with a graph that is symmetric and bell-shaped, as in the below figure, we say that it has a normal distribution. Characteristics of the Normal Distribution The following are some important characteristics of the normal distribution. 1. It is symmetrical about its mean; the curve on either side of is a mirror image of the other side. 50 percent of the area is to the right of a perpendicular erected at the mean, and 50 percent is to the left. 2. The mean, the median, and the mode are all equal. 16 3. The total area under the curve above the x-axis is one square unit. This characteristic follows from the fact that the normal distribution is a probability distribution. 17 4. One-Sigma Rule: Approximately 68 percent of the data values should lie within one standard deviation of the mean. See Figure 4.6.2 (a) Two-Sigma Rule: Approximately 95 percent of the data values should lie within two standard deviations of the mean. See Figure 4.6.2 (b) Three-Sigma Rule: Approximately 99.7 percent of the data values should lie within three standard deviations of the mean. See Figure 4.6.2 (c) 5. The normal distribution is completely determined by the parameters and . Because of the characteristics of these two parameters, is often referred to as a location parameter and is often referred to as a shape parameter. Practice Exercise 7: Given that a sample is approximately bell-shaped with a mean of 120 and a standard deviation of 10, the approximate percentage of data values that is expected to fall between 100 and 140 is a) 75 percent. b) 95 percent. c) 68 percent. d) 99.7 percent. 18 The Standard Normal Distribution The most important member of this family is the standard normal distribution or unit normal distribution, as it is sometimes called. It may be obtained by creating a random variable z x The following figure illustrates the conversion from a nonstandard to a standard normal distribution. The standard normal distribution is a normal probability distribution with 0 and 1 . The total area under its density curve is equal to 1. Notation P(a < z < b) denotes the probability that the z score is between a and b. P(z > a) denotes the probability that the z score is greater than a. P(z < a) denotes the probability that the z score is less than a. 19 Practice Exercise 8: Which of the following is not true about the standard normal distribution? a) b) c) d) The total probability under the standard normal curve is 1. The standard normal curve is symmetric about 0. About 68% of its observations fall between -1 and 1. The probability under the standard normal curve to the left of z = 0 is negative Probability can’t be negative. e) About 95% of its observations fall between -2 and 2. Methods for Finding Normal Distribution Areas Find the following probabilities, using the z table. P( z < 2 ) = P(z <=2 ) = P( z < 0 ) = P( z < 1.5 ) = P( z > 2 ) = P(z > 0.23 ) = P( z > -1.31 ) = P( -2 < z < 2 ) = P( -1.5 < z < 1.5 ) = P( -1.23 < z < 2.34 ) = P( z = 0 ) = P( z = 1.35 ) = Rules: 20 P(z > a) = 1 – P(z < a ) P(a < z < b) = P(z < b) – P(z < a) 21 Normal Distribution Applications Example H Diskin et al. (A-11) studied common breath metabolites such as ammonia, acetone, isoprene, ethanol, and acetaldehyde in five subjects over a period of 30 days. Each day, breath samples were taken and analyzed in the early morning on arrival at the laboratory. For subject A, a 27-year-old female, the ammonia concentration in parts per billion (ppb) followed a normal distribution over 30 days with mean 491 and standard deviation 119. What is the probability that on a random day, the subject’s ammonia concentration is between 292 and 649 ppb? 22 Exercise: Suppose the average length of stay in a chronic disease hospital of a certain type of patient is 60 days with a standard deviation of 15. If it is reasonable to assume an approximately normal distribution of lengths of stay, find the probability that a randomly selected patient from this group will have a length of stay: a) Greater than 50 days b) Less than 30 days 23 c) Between 30 and 60 days d) Greater than 90 days 24 Example I If the total cholesterol values for a certain population are approximately normally distributed with a mean of 200 mg/100 ml and a standard deviation of 20 mg/100 ml, find the probability that an individual picked at random from this population will have a cholesterol value: a) Between 180 and 200 mg/100 ml b) Greater than 225 mg/100 ml 25 c) Less than 150 mg/100 ml d) Between 190 and 210 mg/100 ml 26 Practice Exercise 9: The distribution of bladder volume in men is approximately Normal with mean µ = 550 ml and standard deviation σ = 100 ml. a) What proportion of male bladders are larger than 520 ml? Cumulative Distribution Function Normal with mean = 550 and standard deviation = 100 x 520 b) P( X <= x ) 0.382089 What proportion of male bladders are between 530 and 560 ml? 27 Exercise J A nurse supervisor has found that staff nurses, on the average, complete a certain task in 10 minutes. If the times required to complete the task are approximately normally distributed with a standard deviation of 3 minutes, find: a) The proportion of nurses completing the task in less than 4 minutes b) The proportion of nurses requiring more than 5 minutes to complete the task c) The probability that a nurse who has just been assigned the task will complete it within 3 minutes 28 Practice Exercise 10: Consider the normal curves that have the parameters i) ii) iii) iv) 1.5 and 3 1.5 and 6.2 2.7 and 3 0 and 1 a) Which curve has the largest spread? b) Which curves are centered at the same place? c) Which curves have the same shape? d) Which curve is centered farthest to the left? e) Which curve is the standard normal curve? Another Standard Question: Imagine that the mean male height is 70 inches, and the standard deviation is 4 inches. a) What percent of men are less than 60 inches tall? b) 50% of men are less than _______ inches? c) 25% of men are less than _______inches? Practice Exercise 11: A The brain weights of a certain population of adult Swedish males follow approximately a normal distribution with mean 1,400 gm and standard deviation 100 gm. What percentage of the brain weights are a) 1,500 gm or less? 29 b) between 1,325 and 1,500 c) 1,325 gm? gm or more? 30 d) between 1,200 and 1,600 gm? Practice Exercise 12: The serum cholesterol levels of 12- to 14-year-olds follow a normal distribution with mean 162 mg/dl and standard deviation 28 mg/dl. What percentage of 12 to 14-year-olds have serum cholesterol values a) 171 or more? b) 143 or less? 31 c) 194 or less? d) 105 or more? e) between 166 and 194? 32 f) between 105 and 138? g) between 138 and 166? 33 MATH 2333: Statistics for Life Sciences Practice Quiz (Chapter 4.8 – Binomial Distribution) Name: ________________ 1) Neuroblastoma, is an extracranial solid cancer in childhood and the most common cancer in infancy, a serious, but treatable disease. A urine test called the VMA test has been developed that gives a positive diagnosis in about 70% of cases of neuroblastoma. Assume that a large number of children are to be tested, of whom n = 8 have the disease. We are interested in whether or not the test detects the disease in the 8 children who have the disease. Find the probability that a) none of the eight cases will be detected b) all eight cases will be detected. c) only one case will be missed. d) between two to five cases, inclusive, will be detected. 34 2) Childhood lead poisoning is a public health concern. In a certain population, 1 child in 8 has a high blood lead level (defined as 30 μg/dl or more). In a randomly chosen group of n = 16 children from the population, what is the probability that a) none has high blood lead? b) 1 has high blood lead? c) 2 have high blood lead? d) 3 or more have high blood lead? [Hint: Use parts a) – c).] 35 Chapter Summary X Bin n, p X is a discrete random variable following Binomial distribution with parameters n and p. TI Calculator Commands: Press 2nd VARS binompdf(n, p, x) for exact probability binomcdf(n, p, x) for cumulative probability Poisson X X is a discrete random variable following Poisson distribution with parameter λ. TI Calculator Commands: Press 2nd VARS poissonpdf(µ, x) for exact probability poissoncdf(µ, x) for cumulative probability np npq 2 npq 2 X N , X is a continuous random variable following Normal distribution with parameters µ and σ. Z N 0,1 X is a continuous random variable following Standard Normal distribution with parameters 0 and 1. TI Calculator Commands: Press 2nd VARS normalcdf (minimum value, maximum value) 36 MATH 2333: Statistics for Life Sciences Chapter 4,5 Discrete and Continuous Probability Distributions Solutions to Selected Problems _____________________________________________________________ Exercise 1 Explain why each of the following distributions is or is not a probability distribution: Not a probability distribution since P X x 1 Not a probability distribution since 0 P X x 1. Not a probability distribution since P X x 1 Is a probability distribution since 0 P X x 1 and P X x =1 37 Practice Exercise 2: x p x xp x 0 1 2 3 4 5 Total 0.75 0.13 0.05 0.04 0.02 0.01 1 0 0.13 0.1 0.12 0.08 0.05 0.48 xp x 0.48 x2 0 1 4 9 16 25 x2 p x 0 0.13 0.2 0.36 0.32 0.25 1.26 2 x 2 p x 2 1.26 0.48 1.0296 2 1.0296 1.0146 P X 3 0.07 P X 2 0.05 P X 4 0.99 P X 4 0.01 P 1 X 4 0.24 P X 0 0.75 Exercise A: 15 3 12 0.32 0.68 0.1457 3 a) P X 3 b) P X 5 P X 0 P X 1 P X 2 P X 3 P X 4 15 15 15 0 15 1 14 2 13 0.32 0.68 0.32 0.68 0.32 0.68 0 1 2 15 15 3 12 4 11 3 0.32 0.68 4 0.32 0.68 0.4477 c) P 5 X 9 P X 5 P X 6 P X 7 P X 8 P X 9 0.546 d) P 5 X 10 P X 6 P X 7 P X 8 P X 9 0.33 e) P X 6 1 P X 5 0.34 38 Practice Exercise 3: 15 3 12 a) P X 3 0.12 0.88 0.1695 3 b) P 0 X 3 P X 0 P X 1 P X 2 P X 3 15 15 15 15 0 15 1 14 2 13 3 12 0.12 0.88 0.12 0.88 0.12 0.88 0.12 0.88 0 1 2 3 0.9041 c) np 15 0.12 1.8 npq 15 0.12 0.88 1.2585 Practice Exercise 4: a) Binomial distribution with n = 10 and p = 0.07 b) 10 1 9 P X 1 0.07 0.93 0.3642 1 10 2 8 P X 2 0.07 0.93 0.1233 2 c) P X 3 P X 0 P X 1 P X 2 P X 3 0.9964 10 0 10 d) P X 1 1 P X 1 1 P X 0 1 0.07 0.93 0.5160 0 e) np 10 0.07 0.7 npq 10 0.07 0.93 0.8068 39 Practice Exercise 5: (Binomial) 40 a) No cases of GBS Poisson with mean = 1.3 x 0 P( X = x ) 0.2725 b) At least one case of GBS P X 1 1 P X 0 1 0.2725 0.7275 c) Fewer than five cases of GBS P X 4 P X 0 P X 1 P X 2 P X 3 P X 4 0.989 Poisson with mean = 1.3 x 4 P( X <= x ) 0.989337 292 491 x 649 491 P 292 x 649 P P 1.67 Z 1.33 0.8606 119 119 There is a 0.8606 probability that on a random day, the subject’s ammonia concentration is between 292 ppb and 649 ppb. Practice Exercise 7: Given that a sample is approximately bell-shaped with a mean of 120 and a standard deviation of 10, the approximate percentage of data values that is expected to fall between 100 and 140 is a) 75 percent. b) 95 percent. c) 68 percent. d) 99.7 percent. 41 Practice Exercise 8: Which of the following is not true about the standard normal distribution? a) b) c) d) The total probability under the standard normal curve is 1. The standard normal curve is symmetric about 0. About 68% of its observations fall between -1 and 1. The probability under the standard normal curve to the left of z = 0 is negative e) About 95% of its observations fall between -2 and 2. Practice Exercise 10: Consider the normal curves that have the parameters v) vi) vii) viii) 1.5 and 3 1.5 and 6.2 2.7 and 3 0 and 1 a) Which curve has the largest spread? (ii) only. b) Which curves are centered at the same place? (i) and (ii) c) Which curves have the same shape? (i) and (iii) d) Which curve is centered farthest to the left? (iii) only. e) Which curve is the standard normal curve? (iv) only. 4 10 a) P x 4 P Z 0.0228 3 5 10 b) P x 5 P Z 1 0.0475 0.9525 3 3 10 P Z 2.33 0.0099 c) P x 3 P Z 3 42 Practice Exercise 11: A The brain weights of a certain population of adult Swedish males follow approximately a normal distribution with mean 1,400 gm and standard deviation 100 gm. What percentage of the brain weights are a) 1,500 gm or less? Cumulative Distribution Function Normal with mean = 1400 and standard deviation = 100 x 1500 P( X <= x ) 0.841345 P(X ≤ 1500) = 0.8413 b) between 1,325 and 1,500 gm? Cumulative Distribution Function Normal with mean = 1400 and standard deviation = 100 x 1500 P( X <= x ) 0.841345 x 1325 P( X <= x ) 0.226627 P(1325 ≤ c) 1,325 X ≤ 1500) = 0.8413 – 0.2266 = 0.6147 gm or more? Cumulative Distribution Function Normal with mean = 1400 and standard deviation = 100 x 1325 P( X <= x ) 0.226627 P(X ≥ 1325) = 1 – P(X ≤ 1325) = 1 – 0.2266 = 0.7734 d) between 1,200 and 1,600 gm? Cumulative Distribution Function Normal with mean = 1400 and standard deviation = 100 x 1600 P( X <= x ) 0.977250 x 1200 P( X <= x ) 0.0227501 P(1200 ≤ X ≤ 1600) = 0.9772 – 0.0227 = 0.9545 43 Homework # 2 Exercise 1 (Binomial Distribution): It is known that 17 percent of the members of a certain population are highly susceptible to have juvenile diabetics. What is the probability that in a sample of n = 10 subjects drawn at random from this population exactly 2 will have juvenile diabetics? Exercise 2 (Binomial Distribution): A certain drug treatment cures 90% of cases of hookworm in children. Suppose that 15 children suffering from hookworm are to be treated. Let X be the number of cured among 15 children treated for the disease. a) Find P(X < 13) b) Find P(10 < X ≤ 15) 44 Exercise 3 (Poisson Distribution): A newborn baby is considered to have a low birth weight if it weighs less than 2500 grams. Such babies often require extra care. Cypress County, AB, has been experiencing a mean of 206 cases of low birth weight each year (This figure was made up, not true). Find the probability that on a given day, there is more than 1 baby born with a low birth weight. Exercise 4 (Poisson Distribution): In a certain population an average of 13 new cases of esophageal cancer are diagnosed each year. If the annual incidence of esophageal cancer follows a Poisson distribution, find the probability that in a given year the number of newly diagnosed cases of esophageal cancer will be: a) Exactly 10 b) Between 9 and 11, inclusive 45 Homework # 2 – Solutions Exercise 1 (Binomial Distribution): Given n 10 and p 0.17 10! 2 8 2 8 P X 2 10 C2 0.17 0.83 0.17 0.83 0.2929 2! 10 2 ! Exercise 2 (Binomial Distribution): a) P(X < 13) = P(X ≤ 12) = 0.1840 Cumulative Distribution Function Binomial with n = 15 and p = 0.9 x 12 P( X <= x ) 0.184061 b) P(10 < X ≤ 15) = P(X ≤ 15) – P( X ≤ 10) = 1 – 0.0127 = 0.9873 Cumulative Distribution Function Binomial with n = 15 and p = 0.9 x 10 P( X <= x ) 0.0127205 Exercise 3 (Poisson Distribution): The average number of cases per day is 206 365 0.564. X ~ Poisson(λ = 0.564) P X 1 1 P X 1 1 P X 0 P X 1 e 0.564 0.564 0 e 0.564 0.564 1 1 1 0.5689 0.3209 0.1102 0! 1! Exercise 4 (Poisson Distribution): a) P(X = 10) = 0.0859 Probability Density Function Poisson with mean = 13 x 10 P( X = x ) 0.0858702 b) P(9 ≤ X ≤ 11) = P(X ≤ 11) – P(X ≤ 9) = 0.3532 – 0.0998 = 0.2534 Cumulative Distribution Function Poisson with mean = 13 x 11 P( X <= x ) 0.353165 Cumulative Distribution Function Poisson with mean = 13 x 8 P( X <= x ) 0.0997579 46 MATH 2333: Statistics for Life Sciences Practice Problems for Term Test # 2 1) Many new drugs have been introduced in the last several decades to bring hypertension under control – that is, to reduce high blood pressure to normotensive levels. Suppose a physician agrees to use a new antihypertensive drug on a trial basis on the first 4 untreated hypertensives she encounters in her practice, before deciding whether to adopt the drug for routine use. Let X = the number of patients out of 4 who are brought under control. Then X is a discrete random variable which takes on the value of 0, 1, 2, 3, and 4. Suppose from previous experience with the drug, the drug company expects that for any clinical practice the probability that 0 patients out of 4 will be brought under control is 0.008, 1 patient out of 4 is 0.076, 2 patients out of 4 is 0.265, 3 patients out of 4 is 0.411 and all 4 patients is 0.240. a) Draw the distribution table b) Check that it is a correct distribution c) Compute the following probabilities P X 2 , d) Calculate P X 1 , P 1 X 3 µ = E(X), σ2 = Var(X) and σ = SD(X) Source: Fundamentals of Biostatistics, 6th Edition by Bernard Rosner (2006) 2) A group of college students were surveyed to learn how many times they had visited a dentist in the previous year. The probability distribution for Y, the number of visits, is given by the following table: Y(number of visits) 0 1 2 3 Total P(Y = y) 0.15 0.50 0.30 0.05 1 Calculate the mean, , of the number of visits and the standard deviation, , of the random variable Y. 47 3) A recent study reported that the prevalence of hyperlipidemia (defined as total cholesterol over 200) is 40% in children 2 to 6 years of age. If 15 children are analyzed: a) What is the probability that at least 3 are hyperlipidemic? b) How many would be expected to meet the criteria for hyperlipidemia? In other words, what is the mean number of children expected to be hyperlipidemic? 4) Researchers looked at the occurrence of retinal capillary hemangioma (RCH) in patients with von Hippel–Lindau (VHL) disease. RCH is a benign vascular tumor of the retina. Using a retrospective consecutive case series review, the researchers found that the number of RCH tumor incidents followed a Poisson distribution with λ = 5 tumors per eye for patients with VHL. Find the probability that in a randomly selected patient with VHL there are between four and six occurrences of tumors per eye, inclusive. 5) Alpha fetoprotein (AFP) is a substance produced by a fetus that can be measured in pregnant woman to assess the probability of problems with fetal development. High levels of AFP have been seen in babies with neural-tube defects. When measured at 15-20 weeks gestation, AFP is normally distributed with a mean of 58 and a standard deviation of 18 . What is the probability that AFP exceeds 75 in a pregnant woman measured at 18 weeks gestation? In other words, what is P(X > 75)? – Chapter 4. In a sample of n 50 women, what is the probability that their mean AFP exceeds 62? In other words, what is P X 62 ? This part requires the application of the Central Limit Theorem (CLT) – Chapter 5. X 62 58 P X 62 P P Z 1.57 1 Z 1.57 1 0.9418 0.0582 18 n 50 There is a 5% chance that the sample mean AFP of 50 women will exceed 62. 6) According to data from the second National Health and Nutrition Examination Survey, (NHANES II, NCHS 1992), 26.8 percent of persons 20–74 years of age in the U.S. had high serum cholesterol values. In a sample of n = 20 persons ages 20–74 years, what is the probability that less than 2 persons will have high serum cholesterol? 48 7) Based on reports from the Public Health Agency of Canada and the World Health Organization, the estimated incidence of tuberculosis (all forms of TB not just smear positive TB) on average is 5 cases per 100,000 population in Canada. This follows a Poisson process with λ = 5. What is the probability of a health department, in a provincial county of 100,000, observing between 4 and 6 cases, inclusive, if the national rate held in the county? (Source: http://www.phac-aspc.gc.ca/tbpc-latb/itir-eng.php) 8) Suppose it is known from previous treatment data that the probability of recovery for a certain disease is p = 0.35. If n = 12 people are stricken with the disease, what is the probability that: a) 3 or more will recover? b) between 4 and 6, inclusive, will recover? 9) Neuroblastoma, is an extracranial solid cancer in childhood and the most common cancer in infancy, a serious, but treatable disease. A urine test called the VMA test has been developed that gives a positive diagnosis in about 70% of cases of neuroblastoma. Assume that a large number of children are to be tested, of whom a sample of eight children have the disease. We are interested in whether or not the test detects the disease in the sample of children who have the disease. Find the probability that between four to seven cases, inclusive, will be detected. 10) Tetanus is caused by a neurotoxic spore (a powerful poison that act on nerves of the spinal cord) produced by the tetanus bacterium Clostridium tetani. The disease, which used to kill about 40 to 50 Canadians a year in the 1920s and 30s, is now only rarely reported. In recent years, Canada has seen, on average, only a couple of cases a year. The number of cases of tetanus reported in Canada during 2011 has a Poisson distribution with parameter λ = 2. What is the probability that two or more cases of tetanus will be reported during a given year? 11) The heights of men (in inches) in a certain population follow a normal distribution with mean µ = 69.7 inches and standard deviation σ = 3.1 inches If a man is chosen at random from the population, find the probability that he will be less than 72 inches tall. 49 12) A certain drug causes kidney damage in 1% of patients. Suppose the drug is to be tested on 50 patients. Find the probability that a) none of the patients will experience kidney damage. b) one or more of the patients will experience kidney damage. [Hint: Use part a) to answer part b).] 13) According to the British Columbia Centre of Excellence for Women's Health the Canadian caesarean section (C-Section) rate is approximately p = 26%. Suppose a random sample of n = 12 deliveries from the hospital from the hospital records is selected. Of the delivery records pulled for 2013. What is the probability that a) exactly 3 babies were delivered using C-Section? b) less than 3 babies were delivered using C-Section? 14) The serum cholesterol levels of 12- to 14-year-olds follow a normal distribution with mean 162 mg/dl and standard deviation 28 mg/dl. Determine the percentage of 12- to 14-year-olds have serum cholesterol values between 166 mg/dl and 194 mg/dl? 15) In a health examination survey of a particular province in Canada, the fasting blood glucose level for the population is normally distributed with a mean of µ = 99.0 and a standard deviation of σ = 12. Determine the probability that an individual selected at random will have a blood sugar reading is greater than 120. 16) A group of college students were surveyed to learn how many times they had visited a dentist in the previous year. The discrete probability distribution for X, the number of visits, is given by the following table: x (number of visits) P(X = x) 0 1 2 3 4 Total 0.15 0.40 0.20 0.10 0.15 1 50 Calculate the mean, , of the number of visits and the standard deviation , , of the random variable X. Compute the following probabilities: P X 2 , P X 1 , P 1 X 3 17) Microfracture knee surgery has a 75% chance of success on patients with degenerative knees. The surgery is performed on ten patients. a) Find the probability of the surgery being successful on exactly six patients. b) Find the probability of the surgery is successful on between six and eight, inclusive, patients. c) What is the mean and standard deviation of the number of patients for whom the surgery is expected to be successful? 18) Tetanus is caused by a neurotoxic spore (a powerful poison that act on nerves of the spinal cord) produced by the tetanus bacterium Clostridium tetani. The disease, which used to kill about 40 to 50 Canadians a year in the 1920s and 30s, is now only rarely reported. In recent years, Canada has rarely seen any cases. (Source: Canadian Notifiable Disease Surveillance System and the Public Health Agency of Canada.) There were four cases of tetanus reported in Canada during 2007. Suppose the occurrence of Tetanus cases follows a Poisson distribution with λ = 4 per year. During a given year what is the probability that will be three or more cases of tetanus reported? 19) A survey was conducted to measure the height of U.S. men. In the survey, respondents were grouped by age. In the 20–29 age group, the heights were normally distributed, with a mean of 69.2 inches and a standard deviation of 2.9 inches. A study participant is randomly selected. (Source: U.S. National Center for Health Statistics) DRAW AND LABEL TWO NORMAL CURVES FOR EACH PART & SHADE AREA UNDER CURVE a) b) c) Find the probability that his height is less than 66 inches. Find the probability that his height is between 66 and 72 inches. Find the probability that his height is more than 72 inches. 51 MATH 2333: Statistics for Life Sciences Practice Problems for Term Test # 2 – Solutions 1) X is a discrete random variable taking values distribution is given by x p x xp x x2 x2 p x 0 1 2 3 4 Total 0.008 0.076 0.265 0.411 0.24 1 0 0.076 0.53 1.233 0.96 2.799 0 1 4 9 16 * 0 0.076 1.06 3.699 3.84 8.675 x = 0, 1, 2, 3, 4. The probability P X 2 P X 0 P X 1 P X 2 0.008 0.076 0.265 0.349 P X 1 1 P X 1 1 0.084 0.916 P 1 X 3 P X 1 P X 2 P X 3 0.076 0.265 0.411 0.752 2 xp x 2.799 x p x 2 8.675 2.799 2 2 0.8406 0.8406 0.9168 2) x (number of visits) 0 1 2 3 Total P(X = x) 0.15 0.50 0.30 0.05 1 xp(x) x2 x2 p(x) 0 0.50 0.60 0.15 1.25 0 1 4 9 * 0 0.50 1.20 0.45 2.15 xp x 1.25 x p x 2 2 2.15 1.25 0.766 2 52 3) Binomial distribution: Let X be the number of children who are reported to be hyperlipidemic from a sample of n = 15. The probability of success p = 0.40. a) P X 3 1 P X 2 1 P X 0 P X 1 P X 2 1 15C0 0.40 0.60 15C1 0.40 0.60 15C2 0.40 0.60 0 15 1 14 2 13 1 0.0271 0.9729 c) Expected number of children to be hyperlipidemic is np 15 0.40 6 children. 4) P 4 X 6 P X 4 P X 5 P X 6 e 5 54 e 5 55 e 5 56 0.175 0.175 0.147 0.4970 4! 5! 6! 5) This addresses the probability of observing a single woman with an AFP exceeding 75. X 75 58 P X 75 P P Z 0.94 1 Z 0.94 1 0.8264 0.1736 18 6) The data follows n 20, p 0.268, q 0.732 a Binomial distribution with P X 2 P X 0 P X 1 20 20 0 20 1 19 0.268 0.732 0.268 0.732 0.0143 0.0019 0.0162 0 1 53 7) Poisson distribution: P 4 X 6 P X 4 P X 5 P X 6 e 5 54 e 5 55 e 5 56 4! 5! 6! 0.1754 0.1754 0.1462 0.497 8) Let X be the number of people who recover from the disease. Here X is discrete random variable following Binomial distribution with n = 12 and probability of success p = 0.35. a. P X 3 1 P X 2 1 P X 0 P X 1 P X 2 1 12C0 0.35 0.65 12C1 0.35 0.65 12C2 0.35 0.65 0 12 1 11 2 10 0.8487 b. P 4 X 6 P X 4 P X 5 P X 6 12C4 0.35 0.65 12C5 0.35 0.65 4 8 5 7 12C6 0.35 0.65 6 6 0.2367 0.2039 0.1281 0.5687 9) Binomial Distribution P 4 X 7 P X 4 P X 5 P X 6 P X 7 8 8 8 8 4 4 5 3 6 2 7 1 0.70 0.30 0.70 0.30 0.70 0.30 0.70 0.30 4 5 6 7 0.1977 0.2965 0.2541 0.1361 0.8844 54 10) Poisson distribution: X = number of cases of tetanus reported with λ = 2. P X 0 P X 2 1 P X 1 1 P X 1 e 2 20 e 2 21 1 1 0.4060 0.5940 1! 0! 11) Normal Distribution 72 69.7 X P X 72 P P Z 0.74 0.7703 3.1 12) Binomial Distribution 50 0 50 a) P X 0 0 0.01 0.99 0.6050 b) P X 1 1 P X 0 1 0.6050 0.3950 13) Binomial Distribution a) P(X = 3) = 12C3 0.26 0.76 3 9 0.2573 Probability Density Function Binomial with n = 12 and p = 0.26 x 3 P( X = x ) 0.257293 b) P(X < 3) = P(X ≤ 2) = P(X = 0) + P(X = 1) + P(X = 2) = 12C0 0.26 0.76 12C1 0.26 0.76 12C2 0.26 0.76 0 12 1 11 2 10 = 0.3603 Cumulative Distribution Function Binomial with n = 12 and p = 0.26 x 2 P( X <= x ) 0.360338 55 14) X = serum cholesterol levels of 12- to 14-year-olds. X ~ N(µ = 162, σ = 28). P(166 ≤ X ≤ 194) = P(0.14 ≤ Z ≤ 1.14) = 0.8735 – 0.5568 = 0.3167 Cumulative Distribution Function Normal with mean = 162 and standard deviation = 28 x 194 P( X <= x ) 0.873451 Cumulative Distribution Function Normal with mean = 162 and standard deviation = 28 x 166 P( X <= x ) 0.556798 15) X ~ N(µ = 99.0, σ = 12). 120 99 X P X 120 P P Z 1.75 12 1 Z 1.75 1 0.9599 0.0401 56 57