Uploaded by Ceilidh Gillis

Chapter 4 and 5

advertisement
MATH 233
Chapter 4
Probability Distributions of Discrete Variables
_____________________________________________________________
Random Variables
Definition: A random variable is a function that assigns a numeric value to
each event in a sample space.
Example: Coin Toss
Event
Head
Tail
X
0
1
Definition: The probability distribution of a discrete random variable is a table
used to specify all possible values of a discrete random variable along with
their respective probabilities.
If x1, x2, x3, . . . , xk are all possible values of the discrete random variable X,
then we may then give the following two essential properties of a probability
distribution of a discrete variable:
(1)
(2)
0  P  X  x  1
 P  X  x
 1, for all x.
Practice Exercise 1: Explain why each of the following distributions is or is
not a probability distribution:
1
Mean and Variance of Discrete Probability Distributions:
The formulas are below. The standard deviation σ is the square root of the
variance. The mean can be written as μ, the subscript x is not required.
2
Practice Exercise 2: Let x be a discrete random variable taking values 0, 1,
2, 3, 4 or 5.
x
P  X  x
0
1
2
3
4
5
Total
0.75
0.05
0.04
0.02
0.01
a) Fill in the table so that it is a probability distribution.
b) Determine the mean, standard deviation and variance of the distribution.
c) Compute the following probabilities
P  X  3 ,
P  X  2 ,
P  X  4 ,
P  X  4 ,
P 1  X  4 ,
P  X  0
3
The Binomial Distribution
When a random process or experiment, called a trial, can result in only one of
two mutually exclusive outcomes, such as dead or alive, sick or well, full-term
or premature, the trial is called a Bernoulli trial.
The Bernoulli Process
A sequence of Bernoulli trials forms a Bernoulli process under the following
conditions.
1. Each trial results in one of two possible, mutually exclusive, outcomes. One
of the possible outcomes is denoted (arbitrarily) as a success, and the other is
denoted a failure.
2. The probability of a success, denoted by p, remains constant from trial to
trial. The probability of a failure (1 – p), is denoted by q.
3. The trials are independent; that is, the outcome of any particular trial is not
affected by the outcome of any other trial.
The binomial distribution has two parameters, n and p. Where n represents the
number of independent Bernoulli trials and p is the probability of success. The
probability of failure is denoted by q = (1 – p).
The binomial probability for x successes in n trial is given as,
P  X  x 
n!
p x qn  x  n Cx p x q n  x
x ! n  x !
for x  0,1,2,
,n
When you use an uppercase symbol (X), it is a random variable and when you
use lower case (x) it is a value.
The mean, variance and standard deviation of the binomial distribution are
  np ,
 2  np 1  p   npq
and
  npq respectively.
TI Calculator Commands: X is a discrete random variable following Binomial distribution with
parameters n and p.
Press 2nd VARS
binompdf(n, p, x) for exact probability
4
binomcdf(n, p, x) for cumulative probability
Exercise A: 32% of people in a certain town have diabetes. A sample of 15
people are taken. Let X be the number in the sample with diabetes. Find the
following probabilities:
a)
b)
c)
d)
e)
The probability
The probability
The probability
P(5<X<10)
The probability
that exactly 3 people in the sample have diabetes.
that less than 5 have diabetes
that between 5 and 9 (inclusive) have diabetes
that at least 6 have diabetes
5
Exercise B: Suppose it is known that 10 percent of a certain population is
color blind. If a random sample of 25 people is drawn from this population, find
the probability that:
a) Three or fewer will be color blind.
b) Four or more will be color blind.
c) Between three and six inclusive will be color blind.
d) Two, three, or four will be color blind.
6
Practice Exercise 3: Allergic rhinitis is an inflammation of the nasal airways
that occurs when an allergen, such as pollen or dust is inhaled by an
individual.
It is known that 12% of the members of a certain population with a sensitized
immune system are highly susceptible to have Allergic rhinitis.
A random sample of n = 15 subjects is drawn from this population. What is the
probability that,
a) exactly 3 will have Allergic rhinitis.
Binomial with n = 15 and p = 0.12
x
3
P( X = x )
0.1695
b) between zero and three, inclusive, will have Allergic rhinitis.
Binomial with n = 15 and p = 0.12
x
3
P( X <= x )
0.9041
c) determine the mean and standard deviation.
  np 
 
npq 

7
Practice Exercise 4: Antibiotic resistance occurs when disease-causing
microbes become resistant to antibiotic drug therapy. Because this resistance
is typically genetic and transferred to the next generation of microbes, it is a
very serious public health problem.
According the Centers
treated in 2004 were
prescribed ciprofloxacin
week in 2004. (Source:
By Baldi, and Moore)
for Disease Control (CDC), 7% of gonorrhea cases
resistant to the antibiotic ciprofloxacin. A physician
for the treatment of 10 cases of gonorrhea during one
The Practice of Statistics in the Life Sciences, 2nd Ed.
a) What is the distribution of the cases resistant to ciprofloxacin?
b) What is the probability that exactly 1 out of the 10 cases was resistant to
ciprofloxacin? What is the probability for exactly 2 out of 10?
c) What is the probability that at most 3 out of the 10 cases were resistant to
ciprofloxacin?
d) What is the probability that 1 or more of the 10 cases were resistant to
ciprofloxacin? (Hint: It is easier to first find the probability that exactly 0 of the
10 cases were resistant.)
e) What is the mean number of gonorrhea cases that are resistant to the
antibiotic ciprofloxacin out of 10 cases? What is the standard deviation
the count of antibiotic-resistant cases?
 of
Solution:
8
Practice Exercise 5: The probability is 0.314 that the gestation period of a
woman will exceed 9 months. In six human births, what is the probability that
the number in which the gestation period exceed 9 months is
a)
b)
c)
d)
exactly three?
exactly five?
at least five?
between three and five, inclusive?
9
The Poisson Distribution
If x is the number of independent occurrences of some random event in an
interval of time or space, the probability that x will occur is given by
f  x  P  X  x 
e  x
x!
for x  0,1,2,
The Greek letter  (lambda) is called the parameter of the distribution and is
the average number of occurrences of the random event in the interval. The
symbol e is the constant (to four decimals) 2.7183.
The mean, variance and standard deviation of the Poisson distribution
2
are    ,    and    respectively.
TI Calculator Commands: X is a discrete random variable following Poisson distribution with
parameter λ.
Press 2nd VARS
poissonpdf(µ, x) for exact probability
poissoncdf(µ, x) for cumulative probability
Example C: In a study of drug-induced anaphylaxis among patients taking
rocuronium bromide as part of their anesthesia, researchers found that the
occurrence of anaphylaxis followed a Poisson model with λ = 12 incidents per
year in Norway.
Find the probability that in the next year, among patients receiving
rocuronium, exactly three will experience anaphylaxis.
10
Example D Refer to Example C. What is the probability that at least three
patients in the next year will experience anaphylaxis if rocuronium is
administered with anesthesia?
11
Exercise E: Researchers looked at the occurrence of retinal capillary
hemangioma (RCH) in patients with von Hippel–Lindau (VHL) disease. RCH is a
benign vascular tumor of the retina. Using a retrospective consecutive case
series review, the researchers found that the number of RCH tumor incidents
followed a Poisson distribution with λ = 4 tumors per eye for patients with VHL.
Using this model, find the probability that in a randomly selected patient with
VHL:
a) There are exactly five occurrences of tumors per eye.
b) There are more than five occurrences of tumors per eye.
c) There are fewer than five occurrences of tumors per eye.
d) There are between five and seven occurrences of tumors per eye, inclusive.
12
Exercise F: In a certain population an average of 13 new cases of esophageal
cancer are diagnosed each year. If the annual incidence of esophageal cancer
follows a Poisson distribution, find the probability that in a given year the
number of newly diagnosed cases of esophageal cancer will be:
a) Exactly 10
b) At least three
c) No more than three
d) Between 12 and 15, inclusive
e) Fewer than four
13
Exercise G: In a study of the relationship between measles vaccination and
Guillain-Barré syndrome (GBS), Silveira et al., used a Poisson model in the
examination of the occurrence of GBS during latent periods after vaccinations.
They conducted their study in Argentina, Brazil, Chile, and Colombia. They
found that during the latent period, the rate of GBS was λ = 1.3 cases per day.
Using this estimate, find the probability on a given day of:
a) No cases of GBS
b) At least one case of GBS
c) Fewer than five cases of GBS
14
Practice Exercise 6: The number of cases of tetanus reported in Canada
during 2011 has a Poisson distribution with parameter λ = 4.
a) What is the probability that exactly three cases of tetanus will be reported
during a given year?
Probability Density Function
Poisson with mean = 4
x
3
P( X = x )
0.195367
P(X = 3) = 0.1953
b) What is the probability that four or more cases of tetanus will be reported?
Cumulative Distribution Function
Poisson with mean = 4
x
3
P( X <= x )
0.433470
P(X ≥ 4) = 1 – P(X ≤ 3) = 1 – 0.4335 = 0.5665
TEST 1 COVERS UP TO HERE
15
Chapter 5
Probability Distributions of Continuous Variables
Continuous Probability Distributions
The probability distributions considered thus far, the binomial and the Poisson
are distributions of discrete variables. Let us now consider distributions of
continuous random variables.
A continuous variable is one that can assume any value within a specified
interval of values assumed by the variable. Consequently, between any two
values assumed by a continuous variable, there exist an infinite number of
values.
If a continuous random variable has a distribution with a graph that is
symmetric and bell-shaped, as in the below figure, we say that it has a normal
distribution.
Characteristics of the Normal Distribution
The following are some important characteristics of the normal distribution.
1. It is symmetrical about its mean; the curve on either side of is a mirror
image of the other side. 50 percent of the area is to the right of a
perpendicular erected at the mean, and 50 percent is to the left.
2. The mean, the median, and the mode are all equal.
16
3. The total area under the curve above the x-axis is one square unit. This
characteristic follows from the fact that the normal distribution is a probability
distribution.
17
4. One-Sigma Rule: Approximately 68 percent of the data values should lie
within one standard deviation of the mean. See Figure 4.6.2 (a)
Two-Sigma Rule: Approximately 95 percent of the data values should lie within
two standard deviations of the mean. See Figure 4.6.2 (b)
Three-Sigma Rule: Approximately 99.7 percent of the data values should lie
within three standard deviations of the mean. See Figure 4.6.2 (c)
5. The normal distribution is completely determined by the parameters
 and  . Because of the characteristics of these two parameters,  is often
referred to as a location parameter and  is often referred to as a shape
parameter.
Practice Exercise 7: Given that a sample is approximately bell-shaped with a
mean of 120 and a standard deviation of 10, the approximate percentage of
data values that is expected to fall between 100 and 140 is
a) 75 percent.
b) 95 percent.
c) 68 percent.
d) 99.7 percent.
18
The Standard Normal Distribution
The most important member of this family is the standard normal distribution or
unit normal distribution, as it is sometimes called. It may be obtained by
creating a random variable
z
x

The following figure illustrates the conversion from a nonstandard to a
standard normal distribution.
The standard normal distribution is a normal probability distribution with   0
and   1 . The total area under its density curve is equal to 1.
Notation
P(a < z < b)
denotes the probability that the z score is between a and b.
P(z > a)
denotes the probability that the z score is greater than a.
P(z < a)
denotes the probability that the z score is less than a.
19
Practice Exercise 8: Which of the following is not true about the standard
normal distribution?
a)
b)
c)
d)
The total probability under the standard normal curve is 1.
The standard normal curve is symmetric about 0.
About 68% of its observations fall between -1 and 1.
The probability under the standard normal curve to the left of z = 0 is negative
Probability can’t be negative.
e) About 95% of its observations fall between -2 and 2.
Methods for Finding Normal Distribution Areas
Find the following probabilities, using the z table.
P( z < 2 ) = P(z <=2 ) =
P( z < 0 ) =
P( z < 1.5 ) =
P( z > 2 ) =
P(z > 0.23 ) =
P( z > -1.31 ) =
P( -2 < z < 2 ) =
P( -1.5 < z < 1.5 ) =
P( -1.23 < z < 2.34 ) =
P( z = 0 ) =
P( z = 1.35 ) =
Rules:
20
P(z > a) = 1 – P(z < a )
P(a < z < b) = P(z < b) – P(z < a)
21
Normal Distribution Applications
Example H Diskin et al. (A-11) studied common breath metabolites such as
ammonia, acetone, isoprene, ethanol, and acetaldehyde in five subjects over a
period of 30 days. Each day, breath samples were taken and analyzed in the
early morning on arrival at the laboratory.
For subject A, a 27-year-old female, the ammonia concentration in parts per
billion (ppb) followed a normal distribution over 30 days with mean 491 and
standard deviation 119. What is the probability that on a random day, the
subject’s ammonia concentration is between 292 and 649 ppb?
22
Exercise: Suppose the average length of stay in a chronic disease hospital of
a certain type of patient is 60 days with a standard deviation of 15. If it is
reasonable to assume an approximately normal distribution of lengths of stay,
find the probability that a randomly selected patient from this group will have a
length of stay:
a) Greater than 50 days
b) Less than 30 days
23
c) Between 30 and 60 days
d) Greater than 90 days
24
Example I If the total cholesterol values for a certain population are
approximately normally distributed with a mean of 200 mg/100 ml and a
standard deviation of 20 mg/100 ml, find the probability that an individual
picked at random from this population will have a cholesterol value:
a) Between 180 and 200 mg/100 ml
b) Greater than 225 mg/100 ml
25
c) Less than 150 mg/100 ml
d) Between 190 and 210 mg/100 ml
26
Practice Exercise 9: The distribution of bladder volume in men is
approximately Normal with mean µ = 550 ml and standard deviation σ = 100
ml.
a)
What proportion of male bladders are larger than 520 ml?
Cumulative Distribution Function
Normal with mean = 550 and standard deviation = 100
x
520
b)
P( X <= x )
0.382089
What proportion of male bladders are between 530 and 560 ml?
27
Exercise J A nurse supervisor has found that staff nurses, on the average,
complete a certain task in 10 minutes. If the times required to complete the
task are approximately normally distributed with a standard deviation of 3
minutes, find:
a) The proportion of nurses completing the task in less than 4 minutes
b) The proportion of nurses requiring more than 5 minutes to complete the
task
c) The probability that a nurse who has just been assigned the task will
complete it within 3 minutes
28
Practice Exercise 10: Consider the normal curves that have the parameters
i)
ii)
iii)
iv)
  1.5 and   3
  1.5 and   6.2
  2.7 and   3
  0 and   1
a) Which curve has the largest spread?
b) Which curves are centered at the same place?
c) Which curves have the same shape?
d) Which curve is centered farthest to the left?
e) Which curve is the standard normal curve?
Another Standard Question:
Imagine that the mean male height is 70 inches, and the standard deviation is
4 inches.
a) What percent of men are less than 60 inches tall?
b) 50% of men are less than _______ inches?
c) 25% of men are less than _______inches?
Practice Exercise 11: A The brain weights of a certain population of adult
Swedish males follow approximately a normal distribution with mean 1,400 gm
and standard deviation 100 gm. What percentage of the brain weights are
a)
1,500
gm or less?
29
b)
between 1,325 and 1,500
c)
1,325
gm?
gm or more?
30
d)
between 1,200 and 1,600
gm?
Practice Exercise 12: The serum cholesterol levels of 12- to 14-year-olds
follow a normal distribution with mean 162 mg/dl and standard deviation 28
mg/dl. What percentage of 12 to 14-year-olds have serum cholesterol values
a) 171 or more?
b) 143 or less?
31
c) 194 or less?
d) 105 or more?
e) between 166 and 194?
32
f) between 105 and 138?
g) between 138 and 166?
33
MATH 2333: Statistics for Life Sciences
Practice Quiz (Chapter 4.8 – Binomial Distribution)
Name: ________________
1)
Neuroblastoma, is an extracranial solid cancer in childhood and the most
common cancer in infancy, a serious, but treatable disease. A urine test called
the VMA test has been developed that gives a positive diagnosis in about 70%
of cases of neuroblastoma.
Assume that a large number of children are to be tested, of whom n = 8 have
the disease. We are interested in whether or not the test detects the disease in
the 8 children who have the disease.
Find the probability that
a) none of the eight cases will be detected
b) all eight cases will be detected.
c) only one case will be missed.
d) between two to five cases, inclusive, will be detected.
34
2)
Childhood lead poisoning is a public health concern. In a certain
population, 1 child in 8 has a high blood lead level (defined as 30 μg/dl or
more).
In a randomly chosen group of n = 16 children from the population, what is
the probability that
a)
none has high blood lead?
b)
1 has high blood lead?
c)
2 have high blood lead?
d)
3 or more have high blood lead? [Hint: Use parts a) – c).]
35
Chapter Summary
X
Bin  n, p 
X is a discrete random variable following Binomial distribution
with parameters n and p.
TI Calculator Commands: Press 2nd VARS
binompdf(n, p, x) for exact probability
binomcdf(n, p, x) for cumulative probability
Poisson   
X
X is a discrete random variable following Poisson distribution
with parameter λ.
TI Calculator Commands: Press 2nd VARS
poissonpdf(µ, x) for exact probability
poissoncdf(µ, x) for cumulative probability
  np
 
npq
 2  npq
  
 

2

 
X
N  , 
X is a continuous random variable following Normal distribution with parameters
µ and σ.
Z
N  0,1
X is a continuous random variable following Standard Normal distribution with
parameters 0 and 1.
TI Calculator Commands: Press 2nd VARS
normalcdf (minimum value, maximum value)
36
MATH 2333: Statistics for Life Sciences
Chapter 4,5 Discrete and Continuous Probability Distributions
Solutions to Selected Problems
_____________________________________________________________
Exercise 1 Explain why each of the following distributions is or is not a
probability distribution:
Not a probability distribution
since
 P  X  x  1
Not a probability distribution
since
0  P  X  x   1.
Not a probability distribution
since
 P  X  x  1
Is a probability distribution since
0  P  X  x  1
and
 P  X  x =1
37
Practice Exercise 2:
x
p  x
xp  x 
0
1
2
3
4
5
Total
0.75
0.13
0.05
0.04
0.02
0.01
1
0
0.13
0.1
0.12
0.08
0.05
0.48
   xp  x   0.48
x2
0
1
4
9
16
25
x2 p  x 
0
0.13
0.2
0.36
0.32
0.25
1.26
 2   x 2 p  x    2  1.26   0.48  1.0296
2
  1.0296  1.0146
P  X  3  0.07
P  X  2   0.05
P  X  4   0.99
P  X  4   0.01
P 1  X  4   0.24
P  X  0   0.75
Exercise A:
 15 
3
12
 0.32   0.68  0.1457

3
a) P  X  3  
b) P  X  5  P  X  0  P  X  1  P  X  2  P  X  3  P  X  4
 15 
 15 
 15 
0
15
1
14
2
13
    0.32   0.68     0.32   0.68     0.32   0.68 
0
1
2
 15 
 15 
3
12
4
11
 3   0.32   0.68   4   0.32   0.68
 
 
 0.4477
c) P 5  X  9  P  X  5  P  X  6  P  X  7  P  X  8  P  X  9  0.546
d) P  5  X  10  P  X  6  P  X  7  P  X  8  P  X  9  0.33
e) P  X  6  1  P  X  5  0.34
38
Practice Exercise 3:
 15 
3
12
a) P  X  3     0.12   0.88  0.1695
3
b) P  0  X  3  P  X  0  P  X  1  P  X  2  P  X  3
 15 
 15 
 15 
 15 
0
15
1
14
2
13
3
12
    0.12   0.88     0.12   0.88     0.12   0.88     0.12   0.88
0
1
2
 3
 0.9041
c)   np  15  0.12  1.8
  npq  15  0.12  0.88  1.2585
Practice Exercise 4:
a) Binomial distribution with n = 10 and p = 0.07
b)
 10 
1
9
P  X  1     0.07   0.93  0.3642
1
10 
2
8
P  X  2      0.07   0.93  0.1233
2
c) P  X  3  P  X  0  P  X  1  P  X  2  P  X  3  0.9964
 10 
0
10
d) P  X  1  1  P  X  1  1  P  X  0   1     0.07   0.93  0.5160
0
e)   np  10  0.07  0.7
  npq  10  0.07  0.93  0.8068
39
Practice Exercise 5: (Binomial)
40
a) No cases of GBS
Poisson with mean = 1.3
x
0
P( X = x )
0.2725
b) At least one case of GBS
P  X  1  1  P  X  0   1  0.2725  0.7275
c) Fewer than five cases of GBS
P  X  4  P  X  0  P  X  1  P  X  2  P  X  3  P  X  4  0.989
Poisson with mean = 1.3
x
4
P( X <= x )
0.989337
 292  491 x   649  491 
P  292  x  649   P 


  P  1.67  Z  1.33  0.8606

119 
 119
There is a 0.8606 probability that on a random day, the subject’s ammonia
concentration is between 292 ppb and 649 ppb.
Practice Exercise 7: Given that a sample is approximately bell-shaped with a
mean of 120 and a standard deviation of 10, the approximate percentage of
data values that is expected to fall between 100 and 140 is
a) 75 percent.
b) 95 percent.
c) 68 percent.
d) 99.7 percent.
41
Practice Exercise 8: Which of the following is not true about the standard
normal distribution?
a)
b)
c)
d)
The total probability under the standard normal curve is 1.
The standard normal curve is symmetric about 0.
About 68% of its observations fall between -1 and 1.
The probability under the standard normal curve to the left of z = 0 is
negative
e) About 95% of its observations fall between -2 and 2.
Practice Exercise 10: Consider the normal curves that have the parameters
v)
vi)
vii)
viii)
  1.5 and   3
  1.5 and   6.2
  2.7 and   3
  0 and   1
a) Which curve has the largest spread?
(ii) only.
b) Which curves are centered at the same place?
(i) and (ii)
c) Which curves have the same shape?
(i) and (iii)
d) Which curve is centered farthest to the left?
(iii) only.
e) Which curve is the standard normal curve?
(iv) only.
4  10 

a) P  x  4   P  Z 
  0.0228
3 

5  10 

b) P  x  5  P  Z 
  1  0.0475  0.9525
3 

3  10 

  P  Z  2.33  0.0099
c) P  x  3  P  Z 
3 

42
Practice Exercise 11: A The brain weights of a certain population of adult
Swedish males follow approximately a normal distribution with mean 1,400 gm
and standard deviation 100 gm. What percentage of the brain weights are
a)
1,500
gm or less?
Cumulative Distribution Function
Normal with mean = 1400 and standard deviation = 100
x
1500
P( X <= x )
0.841345
P(X ≤ 1500) = 0.8413
b)
between 1,325 and 1,500
gm?
Cumulative Distribution Function
Normal with mean = 1400 and standard deviation = 100
x
1500
P( X <= x )
0.841345
x
1325
P( X <= x )
0.226627
P(1325 ≤
c)
1,325
X ≤ 1500) = 0.8413 – 0.2266 = 0.6147
gm or more?
Cumulative Distribution Function
Normal with mean = 1400 and standard deviation = 100
x
1325
P( X <= x )
0.226627
P(X ≥ 1325) = 1 – P(X ≤ 1325) = 1 – 0.2266 = 0.7734
d)
between 1,200 and 1,600
gm?
Cumulative Distribution Function
Normal with mean = 1400 and standard deviation = 100
x
1600
P( X <= x )
0.977250
x
1200
P( X <= x )
0.0227501
P(1200 ≤
X ≤ 1600) = 0.9772 – 0.0227 = 0.9545
43
Homework # 2
Exercise 1 (Binomial Distribution): It is known that 17 percent of the
members of a certain population are highly susceptible to have juvenile
diabetics.
What is the probability that in a sample of n = 10 subjects drawn at random
from this population exactly 2 will have juvenile diabetics?
Exercise 2 (Binomial Distribution): A certain drug treatment cures 90% of
cases of hookworm in children. Suppose that 15 children suffering from
hookworm are to be treated. Let X be the number of cured among 15 children
treated for the disease.
a) Find P(X < 13)
b) Find P(10 < X ≤ 15)
44
Exercise 3 (Poisson Distribution): A newborn baby is considered to have a
low birth weight if it weighs less than 2500 grams. Such babies often require
extra care.
Cypress County, AB, has been experiencing a mean of 206 cases of low birth
weight each year (This figure was made up, not true).
Find the probability that on a given day, there is more than 1 baby born with a
low birth weight.
Exercise 4 (Poisson Distribution): In a certain population an average of 13
new cases of esophageal cancer are diagnosed each year. If the annual
incidence of esophageal cancer follows a Poisson distribution, find the
probability that in a given year the number of newly diagnosed cases of
esophageal cancer will be:
a) Exactly 10
b) Between 9 and 11, inclusive
45
Homework # 2 – Solutions
Exercise 1 (Binomial Distribution): Given n  10 and p  0.17
10!
2
8
2
8
P  X  2   10 C2  0.17   0.83 
 0.17   0.83  0.2929
2! 10  2  !
Exercise 2 (Binomial Distribution):
a) P(X < 13) = P(X ≤ 12) = 0.1840
Cumulative Distribution Function
Binomial with n = 15 and p = 0.9
x
12
P( X <= x )
0.184061
b) P(10 < X ≤ 15) = P(X ≤ 15) – P( X ≤ 10) = 1 – 0.0127 = 0.9873
Cumulative Distribution Function
Binomial with n = 15 and p = 0.9
x
10
P( X <= x )
0.0127205
Exercise 3 (Poisson Distribution): The average number of cases per day is
  206 365  0.564. X ~ Poisson(λ = 0.564)
P  X  1  1  P  X  1  1  P  X  0   P  X  1
 e 0.564  0.564 0 e 0.564  0.564 1 
 1 

  1  0.5689  0.3209  0.1102
0!
1!


Exercise 4 (Poisson Distribution):
a) P(X = 10) = 0.0859
Probability Density Function
Poisson with mean = 13
x
10
P( X = x )
0.0858702
b) P(9 ≤ X ≤ 11) = P(X ≤ 11) – P(X ≤ 9) = 0.3532 – 0.0998 = 0.2534
Cumulative Distribution Function
Poisson with mean = 13
x
11
P( X <= x )
0.353165
Cumulative Distribution Function
Poisson with mean = 13
x
8
P( X <= x )
0.0997579
46
MATH 2333: Statistics for Life Sciences
Practice Problems for Term Test # 2
1)
Many new drugs have been introduced in the last several decades to bring
hypertension under control – that is, to reduce high blood pressure to
normotensive levels. Suppose a physician agrees to use a new antihypertensive
drug on a trial basis on the first 4 untreated hypertensives she encounters in her
practice, before deciding whether to adopt the drug for routine use.
Let X = the number of patients out of 4 who are brought under control. Then X is
a discrete random variable which takes on the value of 0, 1, 2, 3, and 4.
Suppose from previous experience with the drug, the drug company expects that
for any clinical practice the probability that 0 patients out of 4 will be brought
under control is 0.008, 1 patient out of 4 is 0.076, 2 patients out of 4 is 0.265, 3
patients out of 4 is 0.411 and all 4 patients is 0.240.
a) Draw the distribution table
b) Check that it is a correct distribution
c) Compute the following probabilities
P  X  2 ,
d) Calculate
P  X  1 ,
P 1  X  3
µ = E(X), σ2 = Var(X) and σ = SD(X)
Source: Fundamentals of Biostatistics, 6th Edition by Bernard Rosner (2006)
2) A group of college students were surveyed to learn how many times they had
visited a dentist in the previous year. The probability distribution for Y, the
number of visits, is given by the following table:
Y(number of visits)
0
1
2
3
Total
P(Y = y)
0.15
0.50
0.30
0.05
1
Calculate the mean,  , of the number of visits and the standard deviation,  , of
the random variable Y.
47
3) A recent study reported that the prevalence of hyperlipidemia (defined as total
cholesterol over 200) is 40% in children 2 to 6 years of age. If 15 children are
analyzed:
a) What is the probability that at least 3 are hyperlipidemic?
b) How many would be expected to meet the criteria for hyperlipidemia? In other
words, what is the mean number of children expected to be hyperlipidemic?
4) Researchers looked at the occurrence of retinal capillary hemangioma (RCH) in
patients with von Hippel–Lindau (VHL) disease. RCH is a benign vascular tumor of
the retina. Using a retrospective consecutive case series review, the researchers
found that the number of RCH tumor incidents followed a Poisson distribution with
λ = 5 tumors per eye for patients with VHL.
Find the probability that in a randomly selected patient with VHL there are
between four and six occurrences of tumors per eye, inclusive.
5) Alpha fetoprotein (AFP) is a substance produced by a fetus that can be
measured in pregnant woman to assess the probability of problems with fetal
development. High levels of AFP have been seen in babies with neural-tube
defects. When measured at 15-20 weeks gestation, AFP is normally distributed
with a mean of   58 and a standard deviation of   18 .
What is the probability that AFP exceeds 75 in a pregnant woman measured at 18
weeks gestation? In other words, what is P(X > 75)? – Chapter 4.
In a sample of n  50 women, what is the probability that their mean AFP exceeds


62? In other words, what is P X  62 ?
This part requires the application of the Central Limit Theorem (CLT) – Chapter 5.


X   62  58 

P  X  62   P

 P  Z  1.57   1   Z  1.57   1  0.9418  0.0582


18


n
50 

There is a 5% chance that the sample mean AFP of 50 women will exceed 62.
6) According to data from the second National Health and Nutrition Examination
Survey, (NHANES II, NCHS 1992), 26.8 percent of persons 20–74 years of age in
the U.S. had high serum cholesterol values.
In a sample of n = 20 persons ages 20–74 years, what is the probability that less
than 2 persons will have high serum cholesterol?
48
7) Based on reports from the Public Health Agency of Canada and the World
Health Organization, the estimated incidence of tuberculosis (all forms of TB not
just smear positive TB) on average is 5 cases per 100,000 population in Canada.
This follows a Poisson process with λ = 5.
What is the probability of a health department, in a provincial county of 100,000,
observing between 4 and 6 cases, inclusive, if the national rate held in the county?
(Source: http://www.phac-aspc.gc.ca/tbpc-latb/itir-eng.php)
8) Suppose it is known from previous treatment data that the probability of
recovery for a certain disease is p = 0.35. If n = 12 people are stricken with the
disease, what is the probability that:
a) 3 or more will recover?
b) between 4 and 6, inclusive, will recover?
9) Neuroblastoma, is an extracranial solid cancer in childhood and the most
common cancer in infancy, a serious, but treatable disease. A urine test called the
VMA test has been developed that gives a positive diagnosis in about 70% of
cases of neuroblastoma.
Assume that a large number of children are to be tested, of whom a sample of
eight children have the disease. We are interested in whether or not the test
detects the disease in the sample of children who have the disease.
Find the probability that between four to seven cases, inclusive, will be detected.
10) Tetanus is caused by a neurotoxic spore (a powerful poison that act on
nerves of the spinal cord) produced by the tetanus bacterium Clostridium tetani.
The disease, which used to kill about 40 to 50 Canadians a year in the 1920s and
30s, is now only rarely reported. In recent years, Canada has seen, on average,
only a couple of cases a year.
The number of cases of tetanus reported in Canada during 2011 has a Poisson
distribution with parameter λ = 2. What is the probability that two or more cases
of tetanus will be reported during a given year?
11) The heights of men (in inches) in a certain population follow a normal
distribution with mean µ = 69.7 inches and standard deviation σ = 3.1 inches
If a man is chosen at random from the population, find the probability that he will
be less than 72 inches tall.
49
12) A certain drug causes kidney damage in 1% of patients. Suppose the drug is
to be tested on 50 patients. Find the probability that
a) none of the patients will experience kidney damage.
b) one or more of the patients will experience kidney damage. [Hint: Use part a)
to answer part b).]
13) According to the British Columbia Centre of Excellence for Women's Health
the Canadian caesarean section (C-Section) rate is approximately p = 26%.
Suppose a random sample of n = 12 deliveries from the hospital from the hospital
records is selected. Of the delivery records pulled for 2013. What is the probability
that
a) exactly 3 babies were delivered using C-Section?
b) less than 3 babies were delivered using C-Section?
14)
The serum cholesterol levels of 12- to 14-year-olds follow a normal
distribution with mean 162 mg/dl and standard deviation 28 mg/dl.
Determine the percentage of 12- to 14-year-olds have serum cholesterol values
between 166 mg/dl and 194 mg/dl?
15) In a health examination survey of a particular province in Canada, the fasting
blood glucose level for the population is normally distributed with a mean of µ =
99.0 and a standard deviation of σ = 12.
Determine the probability that an individual selected at random will have a blood
sugar reading is greater than 120.
16) A group of college students were surveyed to learn how many times they
had visited a dentist in the previous year. The discrete probability distribution for
X, the number of visits, is given by the following table:
x (number of visits)
P(X = x)
0
1
2
3
4
Total
0.15
0.40
0.20
0.10
0.15
1
50
Calculate the mean,  , of the number of visits and the standard deviation ,  , of
the random variable X.
Compute the following probabilities:
P  X  2 ,
P  X  1 ,
P 1  X  3
17) Microfracture knee surgery has a 75% chance of success on patients with
degenerative knees. The surgery is performed on ten patients.
a)
Find the probability of the surgery being successful on exactly six
patients.
b)
Find the probability of the surgery is successful on between six and
eight, inclusive, patients.
c)
What is the mean and standard deviation of the number of patients for
whom the surgery is expected to be successful?
18) Tetanus is caused by a neurotoxic spore (a powerful poison that act on
nerves of the spinal cord) produced by the tetanus bacterium Clostridium tetani.
The disease, which used to kill about 40 to 50 Canadians a year in the 1920s and
30s, is now only rarely reported. In recent years, Canada has rarely seen any
cases. (Source: Canadian Notifiable Disease Surveillance System and the Public
Health Agency of Canada.)
There were four cases of tetanus reported in Canada during 2007. Suppose the
occurrence of Tetanus cases follows a Poisson distribution with λ = 4 per year.
During a given year what is the probability that will be three or more cases of
tetanus reported?
19) A survey was conducted to measure the height of U.S. men. In the
survey, respondents were grouped by age. In the 20–29 age group, the
heights were normally distributed, with a mean of 69.2 inches and a standard
deviation of 2.9 inches. A study participant is randomly selected. (Source: U.S.
National Center for Health Statistics)
DRAW AND LABEL TWO NORMAL CURVES FOR EACH PART & SHADE AREA UNDER
CURVE
a)
b)
c)
Find the probability that his height is less than 66 inches.
Find the probability that his height is between 66 and 72 inches.
Find the probability that his height is more than 72 inches.
51
MATH 2333: Statistics for Life Sciences
Practice Problems for Term Test # 2 – Solutions
1) X is a discrete random variable taking values
distribution is given by
x
p  x
xp  x 
x2
x2 p  x 
0
1
2
3
4
Total
0.008
0.076
0.265
0.411
0.24
1
0
0.076
0.53
1.233
0.96
2.799
0
1
4
9
16
*
0
0.076
1.06
3.699
3.84
8.675
x = 0, 1, 2, 3, 4. The probability
P  X  2   P  X  0   P  X  1  P  X  2   0.008  0.076  0.265  0.349
P  X  1  1  P  X  1  1  0.084  0.916
P 1  X  3  P  X  1  P  X  2   P  X  3  0.076  0.265  0.411  0.752
 
2 
 xp  x 
 2.799
 x p  x
  2  8.675   2.799 
2
2
 
 0.8406
0.8406  0.9168
2)
x (number of visits)
0
1
2
3
Total
P(X = x)
0.15
0.50
0.30
0.05
1
xp(x)
x2
x2 p(x)
0
0.50
0.60
0.15
1.25
0
1
4
9
*
0
0.50
1.20
0.45
2.15
   xp  x   1.25

 x p  x  
2
2
 2.15  1.25  0.766
2
52
3)
Binomial distribution: Let X be the number of children who are
reported to be hyperlipidemic from a sample of n = 15. The probability of
success p = 0.40.
a)
P  X  3  1  P  X  2   1  P  X  0   P  X  1  P  X  2 

 1  15C0  0.40   0.60   15C1 0.40  0.60   15C2 0.40  0.60 
0
15
1
14
2
13

 1  0.0271  0.9729
c)
Expected number of children to be hyperlipidemic is
  np  15  0.40  6 children.
4)
P  4  X  6   P  X  4   P  X  5  P  X  6 
e 5 54
e 5 55
e 5 56



 0.175  0.175  0.147  0.4970
4!
5!
6!
5)
This addresses the probability of observing a single woman with an AFP
exceeding 75.
 X   75  58 
P  X  75  P 

  P  Z  0.94   1   Z  0.94   1  0.8264  0.1736
18 
 
6)
The
data
follows
n  20, p  0.268, q  0.732
a
Binomial
distribution
with
P  X  2   P  X  0   P  X  1
 20 
 20 
0
20
1
19
    0.268   0.732      0.268   0.732   0.0143  0.0019  0.0162
0
1
53
7) Poisson distribution: P  4  X  6   P  X  4   P  X  5  P  X  6 

e 5 54 e 5 55 e 5 56


4!
5!
6!
 0.1754  0.1754  0.1462  0.497
8) Let X be the number of people who recover from the disease. Here X is
discrete random variable following Binomial distribution with n = 12 and
probability of success p = 0.35.
a.
P  X  3  1  P  X  2   1  P  X  0   P  X  1  P  X  2 

 1  12C0  0.35  0.65  12C1  0.35  0.65  12C2  0.35  0.65
0
12
1
11
2
10

 0.8487
b.
P  4  X  6   P  X  4   P  X  5  P  X  6 
 12C4  0.35  0.65  12C5  0.35  0.65
4
8
5
7
 12C6  0.35  0.65
6
6
 0.2367  0.2039  0.1281  0.5687
9)
Binomial Distribution
P  4  X  7   P  X  4   P  X  5  P  X  6   P  X  7 
8 
 8
8
8
4
4
5
3
6
2
7
1
    0.70   0.30      0.70   0.30      0.70   0.30      0.70   0.30 
 4
 5
 6
 7
 0.1977  0.2965  0.2541  0.1361  0.8844
54
10) Poisson distribution: X = number of cases of tetanus reported with λ =
2.
P  X  0
P  X  2   1  P  X  1  1 
 P  X  1
 e 2 20
e 2 21 
 1  

  1  0.4060  0.5940
1! 
 0!
11) Normal Distribution
72  69.7 
 X 
P  X  72   P 

  P  Z  0.74   0.7703
3.1 
 
12) Binomial Distribution
 50 
0
50
a) P  X  0    0   0.01  0.99   0.6050
 
b) P  X  1  1  P  X  0  1  0.6050  0.3950
13) Binomial Distribution
a) P(X = 3) = 12C3  0.26   0.76 
3
9
 0.2573
Probability Density Function
Binomial with n = 12 and p = 0.26
x
3
P( X = x )
0.257293
b) P(X < 3) = P(X ≤ 2)
= P(X = 0) + P(X = 1) + P(X = 2)
= 12C0  0.26  0.76  12C1  0.26  0.76  12C2  0.26  0.76
0
12
1
11
2
10
= 0.3603
Cumulative Distribution Function
Binomial with n = 12 and p = 0.26
x
2
P( X <= x )
0.360338
55
14) X = serum cholesterol levels of 12- to 14-year-olds.
X ~ N(µ = 162, σ = 28).
P(166 ≤ X ≤ 194) = P(0.14 ≤ Z ≤ 1.14) = 0.8735 – 0.5568 = 0.3167
Cumulative Distribution Function
Normal with mean = 162 and standard deviation = 28
x
194
P( X <= x )
0.873451
Cumulative Distribution Function
Normal with mean = 162 and standard deviation = 28
x
166
P( X <= x )
0.556798
15) X ~ N(µ = 99.0,
σ = 12).
120  99 
 X 
P  X  120   P 

  P  Z  1.75
12 
 
 1 
 Z  1.75
 1  0.9599  0.0401
56
57
Download