Uploaded by Diogo melo

3 1 stu

advertisement
Topic 3-1 Mean and variance; binomial and Poisson distribution
Binomial distribution:
Motivating example: If four coins are flipped, each with P(H) = 0.7, what is the probability of having
two H(eads) total? One outcome that realises this event is H1H2T3T4 (i.e. Heads on 1st and 2nd coins,
Tails on 3rd and 4th), with probability P(H1H2T3T4) = P(H1) P(H2) P(T3) P(T4) ( s.i. between coins) =
0.70.70.30.3 = 0.720.32. But many other (mutually exclusive) outcomes also count, such as H1T2H3T4 ,
which all have the same probability 0.720.32. How many are there? It is the number of ways to pick 2 out of 4
places to be labelled “H” (and the rest “T”), which is
4!
= 6, so, since the outcomes are all mutually
2!(4  2)!
exclusive,
P(two heads) = P(H1H2T3T4 or H1T2H3T4 or ……) = P(H1H2T3T4) + P(H1T2H3T4) + …… = C 24 0.720.32
Thus, in the general case, to have x “success” among n independent trials, with P(success) = p in each trial, we
have the binomial distribution
n!
p x (1  p ) n  x (x = 0,1,2,…,n)
P(X = x) =
x!(n  x)!
Mean and variance of a discrete random variable
Suppose X could take on discrete values x1, x2, … , xk (e.g. how much a gambler is paid at the conclusion of a
game) with respective probabilities p1, p2, …, pk, then if the experiment is performed a large number of times
(M), one would expect x1 to appear p1M times, x2 to appear p2M times, etc., so the total (e.g. total winnings of
the gambler) is x1(p1M) + x2(p2M) + … + xk(pkM), which can be divided by M to get the average/ mean/
expected value (“winnings per game”) ,
k
x p
E(X)  X =
i
i
i 1
n
Example: for a binomial random variable, the mean is E(X) =
 x x!(n  x)! p
n!
x
(1  p) n  x = np
x 0
Besides the mean, the next important quantity of a random variable is its variability, i.e. how widely its possible
values are spread around the mean. This can be measured by the variance of a random variable,
k
Var(X) =
 (x
i
  X ) 2 pi
i 1
n
For a binomial random variable, its variance is
 ( x  np)
x 0
its standard deviation is X =
2
n!
p x (1  p) n  x = np(1 – p), and therefore
x!(n  x)!
np (1  p) .
Poisson distribution:
Motivating example: Consider people calling a hotline, where historical records give the average rate,  = 1.5
calls per minute. How can we find P(8 calls in 6 minutes) = ?
Note the given average 1.5 is equivalent to 1.5  6 =
1 min  6
1 min
9
6 min.
, i.e. for 6 minutes, the average number of calls
is 9 (denote this number by ). We may first try to solve the problem using a (approximate) binomial model:
Solution 1: Divide the total time into twelve short intervals (30 seconds each) to fit a Bernoulli model (with “n”
= 12). Assumption: only 0 or 1 call can occur in each interval since it is short. To find “p”, recall that the
(binomial) average is np, which must give the right answer  = 9, hence we must use
 9
p 
n 12
Now the problem is translated into “P(8 successes in 12 trials with p = 9/12)”, which is
12  9 
9 12 8 = 0.194
    (1  )


8
12
12
 
8
This answer is not exactly right, since an interval may actually have more than 1 call, so we go on and compute:
Solution 2: Divide the total time into smaller intervals, say of 10-seconds each, so n = 36 and p = 9/36. Note that
p decreases while n goes up, with np being fixed at 9, a constant. Now we have P(8 calls in 36 intervals) =
 36  9 
9 368 = 0.147, which should be a better answer than solution 1. We can go on like this:
    (1  )
36
 8   36 
Interval length (sec)
n=
p = P(X = 8) by binomial
30
12
0.75
0.193577707
10
36
0.25
0.146591655
3.6
100
0.09
0.136604904
0.36
1000
0.009
0.132219053
0.036
10000
0.0009
0.131801777
0.0036 100000
0.00009
0.131760252
0.131756101
0.00036 1000000
0.000009
8
What is the exact solution? Let us formulate the general answer: Suppose X has a binomial distribution, but n
becomes huge (many trials) while p gets tiny (“success” is very rare), while   np (mean total occurrence of
success) stays constant, i.e. neither huge nor tiny. Then for any particular x value,
 n
 x
P(X = x) =   p x (1  p) n  x
=
n!
(np) x (1  p ) n
(n  x)! x! n x (1  p ) x
(  np ) 
 n( n  1)......(n  x  1)  (np ) 
x
=

1 
 1  p 
x

n 
n

 x! 
As n   , p  0, np  (constant) , the first and last term will both approach one. Also, because of the
mathematical fact that (1 + c/n)n approaches ec as n   for any constant c, P(X = x) becomes
n
x
f(x; ) =
x
x!
e 
( x = 0, 1, 2, 3, …...)
(1)
which is called a Poisson distribution with mean . The variance for a Poisson distribution is also , since the
(binomial) variance is np(1 – p)  np(1) = np  .
Hence, returning to the original question, we recognize the problem is a Poisson type, with , the expected
number of calls in the 6-minute period = 1.5  6 min = 9, hence P(X = 8) = e–9 98/8! = 0.13175564
min
Solved problems
Problem 3-1-1
Suppose an offshore platform is designed against the 200-year wave (i.e. a wave height corresponding to a
return period of 200 years), but it is intended to operate for 30 years only.
(a)
What is the probability that it will be subject to waves exceeding its design value during the first year of
operation? (ans. 0.005)
(b)
What is the probability that the platform will not be subject to waves exceeding its design value during its
lifetime? (ans. 0.860)
Solution:
(a)
Since the return period, , is 200 years, this means the yearly probability of exceedance (i.e. encountering
waves exceeding the design value) is
p = 1 /  = 1 / (200 years) = 0.005 (probability per year)
(b)
For each year, the probability of non-exceedance is 1 – 0.005 = 0.995, while the intended lifetime is 30
years. Hence non-exceedance during the whole lifetime has the binomial probability,
0.995  0.860
30
Problem 3-1-2
Show that the mean and variance for a Poisson distribution are both .
Solution:
E(X) =

x
x 0
x!
x
e  ,
(letting y = x - 1)

 e 

x 1
 x 1
( x  1)!
= e  
y

 y !  e  e   

y 0
To calculate Var(X), we first calculate

E(X2) =
x
x 0
2
x
x!
e 
  ( x  1) x 1    x 1  
( x  1  1) x 1 
e   
e 
e 
( x  1)!
x 1
x 1 ( x  1)!
 x 1 ( x  1)!


 
(rewriting x – 1 as y)

y y     y   
e   e      e   e   2   , hence

y
!
 y 0

y 0 y !
= 



Var(X) = E(X2) - (E(X))2 = 2 +  -  = .
Problem 3-1-3
In the fabrication of steel beams, two types of flaws may occur: (1) the inclusion of a small quantity of foreign
matter (“slag”); and (2) the existence of microscopic cracks. It has been found by careful laboratory
investigation that for a certain size I-beam from a given foundry the mean distance between microscopic cracks
is 40 feet along the beam, whereas the slag inclusions exist with an average rate of 4 per 100 feet of beam. Each
of these types of flaw follows a Poisson process.
(a)
For a 20-foot I-beam of this size from this foundry, what is the chance of finding exactly 2 microscopic
cracks in the beam? (ans. 0.076)
(b)
For the same 20-foot beam, what is the chance of finding one or more slag inclusions? (ans. 0.551)
(c)
If a 20-foot beam contained more than 2 flaws, it would be rejected. What is the probability that a 20-foot
beam will be rejected? (ans. 0.143)
(d)
Four 20-foot I-beams are supplied to a contractor by this foundry last year. Assume the flaw conditions
between the four beams are statistically independent. What is the probability that only one of the beams
had been rejected? (ans. 0.360)
Solution:
(a) Let C be the number of microscopic cracks along a 20-feet beam. C has a Poisson distribution with mean
rate C = 1/40 (number per foot), and length of observation t = 20 feet, hence the parameter  = (1/40)(20)
= 0.5, thus
P(C = 2) = e-0.5 (0.52 / 2!)  0.076
(b) Let S denote the number of slag inclusions along a 20-feet beam. S has a Poisson distribution with mean
rate S = 1/25 (number per foot), and length of observation t = 20 feet, hence the parameter  = (1/25)(20) =
0.8, thus
1 – P(S = 0) = 1 - e-0.8  0.551
(c) Let X be the total number of flaws along a 20-feet beam. Along 1000 feet (say) of such a beam, one can
expect (1/40)(1000) = 25 cracks and (1/25)(1000) = 40 slag inclusions. Hence the mean rate of flaw would
be  = (25 + 40)/1000 or simply (1/40) + (1/25) = 0.065 flaws per foot, which is multiplied to the length of
observation, t = 20 feet to get the parameter  = 1.3. Hence
P(X > 2) = 1 – P(X  2) = 1 – e
 0.143
-1.3
2
(1 + 1.3 + 1.3 / 2!) = 1 - 0.857112489
(d) Thinking of beam rejection as “success”, the total number (N) of beams rejected among 4 would follow a
binomial distribution with n = 4 and p = answer in part (c). Thus
 4
P(N = 1) =   40.1428875110.8571124893
1
 0.360
Problem 3-1-4
The air quality in an industrial city may become substandard (poor) at times depending on the weather condition
and the amount of factory production. Suppose the event of poor air quality occurs as a Poisson process with a
mean rate of once per month. During each time period when the air quality becomes substandard, its pollutant
concentration may reach a hazardous level with a 10% probability. Assume that the pollutant concentration
between any two periods of poor air quality are statistically independent.
(a)
What is the probability of at most 2 periods of poor air quality during the next 4-1/2 months? (ans. 0.174)
(b)
What is the probability that the air quality would ever reach hazardous level during the next three months?
(ans. 0.259)
Solution:
(a) Let N be the number of poor air quality periods during the next 4.5 months; N follows a Poisson process
with mean value (1/month)(4.5 months) = 4.5, hence
-4.5
2
P(N  2) = e (1 + 4.5 + 4.5 /2!)  0.174
(b) Since only 10% of poor quality periods have hazardous levels, the “hazardous” periods (H) must occur at a
mean rate of 1 per month10% = 0.1 per month, hence, over 3 months, H has the mean
H = (0.1)(3) = 0.3
 P(ever hazardous) = 1 – P(H = 0) = 1 - e-0.3  0.259
Alternative approach: use total probability theorem: although there is (1 – 0.1) = 0.9 probability of nonhazardous pollution level during a poor air quality period, during a 3-month period there could be any
n
number (n) such periods and the probability of non-hazardous level reduces to 0.9 for a given n. Hence the
total probability of non-hazardous level during the whole time is



(3  0.9) n
e  (13) (1  3) n
= e 3
= e-3 e 30.9 = e-0.3 , hence
0.9 n P( N  n) =
0.9 n
n
!
n
!
n 0
n 0
n 0
P(ever hazardous) = 1 – P(never hazardous) = 1 – e-0.3 = 1 - 0.740818221  0.259



Problem 3-1-5
A country is subject to natural hazards such as floods, earthquakes and tornadoes. Suppose earthquakes occur
according to a Poisson process with a mean rate of one in ten years; tornado occurrences are also Poisson with
mean rate of 0.3 per year. There can be either one or no flood each year; hence the occurrence of a flood each
year follows a Bernoulli sequence, and the mean return period of floods is 5 years. Assume floods, earthquakes
and tornadoes occur independently.
(a)
If no hazards occur during a given year, it is referred to as a “good” year. What is the
probability of a “good” year? (ans. 0.536)
(b)
What is the probability that two of the next five years will be good years? (ans. 0.287)
(c)
What is the probability of only one incidence of natural hazard in a given year? (ans. 0.349)
Solution:
(a)
Let E and T denote the number of earthquakes and tornadoes in one year, respectively. They are both
Poisson random variables with respective means
1
E = Et =
1 year = 0.1; T = 0.3
10 years
Also, the (yearly) probability of flooding, P(F) = 1/5 = 0.2, hence, due to statistical independence among
E, T, F
-0.4
 P(good) = P(E = 0)P(T = 0)P(F’) = e-0.1 e-0.3 (1 – 0.2) = e 0.8  0.536
Note: alternatively, we can let D be the combined number of earthquakes or/and tornadoes, with mean
-0.4
rate D = E +T = 0.1 + 0.3 = 0.4 (disasters per year), and compute P(D = 0)P(F’) = e 0.8 instead
(b)
In each year, P(good year)  p  0.536 (from (a)). Hence P(2 out of 5 years are good)
5
=   p 2 (1  p) 3  0.287
 2
(c)
Let’s work with D as defined in (a).
P(only one incidence of natural hazard) = P(D = 0)P(F) + P(D = 1)P(F’)
–0.4
–0.4
=e
0.2 + (e 0.4)(1 – 0.2)
 0.349
Problem 3-1-6
Highway traffic condition during a blizzard is hazardous. Suppose one traffic accident is expected to occur in
each 50 miles of highway on a blizzard day. Assume that occurrences of accidents along the highway are
modeled by a Poisson process. Consider a stretch of highway that is 20 miles long.
(a)
What is the probability that at least one accident will occur on a given blizzard day? (ans. 0.33)
(b)
Suppose there are five blizzard days this winter. What is the probability that two out of these five blizzard
days are accident free? Assume that accident occurrences between blizzard days are statistically
independent. (ans. 0.16)
Solution:
(a)
Let X be the number of accidents along the 20 miles on a given blizzard day. X has a Poisson distribution
1
with X =
20 miles = 0.4, hence
50 miles
P(X  1) = 1 – P(X = 0) = 1 – e–0.4 = 1 – 0 670320046  0.33
(b)
Let Y be the number of accident-free days among five blizzard days. With n = 5, and p = daily accidentfree probability = P(X = 0)  0 670, we obtain
5
P(Y = 2) =   p 2 (1  p) 3  0.16
 2
Problem 3-1-7
The occurrence of accidents at a busy intersection may be described by a Poisson process with an average rate
of three accidents per year.
(a)
Determine the probability of exactly one accident over a two-month period. Would this be the same as the
probability of exactly two accidents in a four-month period? Explain. (ans. 0.303, no)
(b)
If fatalities are involved in 20% of the accidents, what is the probability of fatalities occurring at this
intersection over a period of two months? Assume that events of fatalities between accidents are
statistically independent. (ans. 0.095)
Solution:
(a)
Let X be the number of accidents in two months. X has a Poisson distribution with
3
X =
2 months = 0.5, hence
12 months
P(X = 1) = e–0.5  0.5  0.303, whereas
P(2 accidents in 4 months) = e–(3/12)(4) [(3/12)(4)]2 / 2!
= e–1 /2!  0.184
No, P(1 accidents in 2 months) and P(2 accidents in 4 months) are not the same.
(b)
20% of all accidents are fatal, so the mean rate of fatal accidents is
F = x0.2 = 0.05 per month
Hence the number of fatalities in two months, F has a Poisson distribution with mean
F = (0.05 per month)(2 months) = 0.1, hence
P(fatalities in two months) = 1 – P(F = 0) = 1 – e–0.1  0.095
Exercises
Exercise 3-1-1
A town is bordered by two rivers as shown in the following figure. Levees A and B were constructed to protect
the town from high water in the rivers. The design return periods of levees A and B are 5 and 10 years
respectively.
A
TOWN
B
Assume that the events of flooding from the two rivers are statistically independent.
(a) Determine the probability that the town will encounter flooding in a given year. (ans. 0.28)
(b) What is the probability that the town will be flooded in at least two of the next five years? (ans. 0.43)
(c) Suppose the townspeople desired to reduce the annual probability of flooding to at most 15%. Levee A may
be improved to have return periods of 10 or 20 years with an investment of 5 and 20 million dollars
respectively; whereas levee B may be improved to have return periods of 20 or 30 years with an investment
of 10 and 20 million dollars respectively. What is the optimal course of action? (ans. A to 10 years and B to
20 years)
Exercise 3-1-2
A contractor submits bids to 3 highway jobs and 2 building jobs. The probability of winning each job is 0.6.
Assume that winning each job is an independent event.
(a)
What is the probability that the contractor will win at most one job? (ans. 0.087)
(b)
What is the probability that the contractor will win at least two jobs? (ans. 0.913)
(c)
What is the probability that he will win exactly 1 highway job, but none of the building jobs? (ans.
0.046)
Exercise 3-1-3
The exterior of a building consists of one hundred 3m  5m glass panels. Past records indicate that on the
2
average one flaw is found in every 50m of this kind of glass panel; also a panel containing two or more flaws
will eventually cause breakage problems and have to be replaced.
(a) What is the probability that a given panel will be replaced? (ans. 0.037)
(b) Replacement of glass panel is usually expensive. If each replacement costs $5,000, what is the expected cost
for replacements on the building? (ans. $18,500)
(c) A higher-grade glass which costs $100 more per panel has on the average one flaw in every 80m2. Should
you recommend using the higher grade panel, if the objective is to minimize the expected total cost of the
glass panels (initial cost and replacement cost)? (ans. yes)
Exercise 3-1-4
The truck traffic on a certain highway can be described as a Poisson process with a mean arrival rate of 1 truck
per minute. The weight of each truck is random, and the probability that a truck is overloaded is 10%.
(a) What is the probability that there will be at least two trucks passing a weigh station on this highway in a 5
minute period? (ans. 0.96)
(b) What is the probability that at most one of the next five trucks stopping at the weigh station will be
overloaded? (ans. 0.92)
(c) Suppose the weigh station will close for 30 minutes during lunch; what is the probability of overloaded
trucks passing the station during the lunch break? (ans. 0.95)
Exercise 3-1-5
The occurrence of tornadoes in a county can be modeled as a Poisson process. Twenty tornadoes have touched
down in a county within the last twenty years. If there is at least one tornado occurring in a year, that year is
classified as a "tornado year."
(a) What is the probability that next year will be a "tornado year?" (ans. 0.632)
(b) What is the probability that there will be two "tornado years" within the next three years? (ans. 0.441)
(c) On the average over a ten-year period,
(i) How many tornadoes are expected to occur? (ans. 10)
(ii) How many "tornado years" are expected to occur? (ans. 6.32)
Exercise 3-1-6
Strong earthquakes occur according to a Poisson process in a metropolitan area with a mean rate of once in fifty
years. There are three bridges in the metropolitan area. When a strong earthquake occurs, there is a probability
of 0.3 that a given bridge will collapse. Assume the events of collapse between bridges during a strong
earthquake are statistically independent; also, the events of bridge collapse between earthquakes are also
statistically independent.
(a) What is the probability of at most one strong earthquake occurring in this metropolitan area within the next
20 years? (ans. 0.938)
(b) During a strong earthquake, what is the probability that exactly one of the three bridges will collapse? (ans.
0.441)
(c) What is the probability of "no bridge collapse from strong earthquakes" during the next 20 years? (ans.
0.769)
Exercise 3-1-7
One of the hazards to an existing underground pipeline is due to improperly conducted excavations. Consider a
system consisting of 100 miles of pipeline. Suppose the number of excavations along this pipeline over the next
year follows a Poisson process with a mean rate of 1 per 50 miles. 40% of the excavations are expected to result
in damage to pipeline. Assume the event of damages between excavations are statistically independent.
(a) What is the probability that there will be at least two excavations along the pipeline next year? (ans. 0.594)
(b) Suppose two excavations would be indeed performed, what is the probability that the pipeline will be
damaged? (ans. 0.64)
(c) What is the probability that the pipeline will not be damaged from excavations next year? (ans. 0.449)
Exercise 3-1-8
Flaws in welding may be assumed to occur according to a Poisson process with a mean rate of 0.1 per foot of
weld.
(a)
Suppose a typical structural connection requires 30 inches of weld and acceptance of such connection
requires no flaws in the weld. What is the probability that a connection will be acceptable? (ans. 0.779)
(b)
For a welding job consisting of three similar structural connections, what is the probability that at least 2
connections will be acceptable? (ans. 0.875)
(c)
What is the possibility that there is altogether only 1 flaw in three structural connections? (ans. 0.354)
Exercise 3-1-9
Highway traffic accidents can be classified into either injury (I) or noninjury (N) accidents. In a given year, the
occurrence rate of these two types of accidents along a stretch of highway are 0.01 and 0.05 per mile,
respectively. Assume that the occurrence of each type of accidents along the highway follows a Poisson process.
Consider a highway that runs between two cities that are 50 miles apart.
(a)
Determine the probability that there will be exactly two noninjury accidents in a given year. (ans. 0.257)
(b)
Determine the probability that there will be at least three accidents in a given year. (ans. 0.577)
(c)
Suppose exactly two accidents occurred last year, what is the probability that both of them involved
injuries? (ans. 0.028)
Exercise 3-1-10
The occurrence of thunderstorms in Peoria, Illinois may be assumed to follow a Poisson process during each of
the two seasons, namely:
I. Winter (October to March)
II. Summer (April to September)
A 21-year record reveals that a total of 173 thunderstorms have taken place during the winter seasons, whereas
840 thunderstorms have occurred during the summer seasons.
(a)
Estimate the mean rate of occurrence of thunderstorms per month for
(i)
the winter season; and
(ii) the summer season. (ans. (i) 1.37, (ii) 6.67)
(b)
What is the probability that there will be a total of 4 thunderstorms during the two months of March and
April next year? (ans. 0.056)
(c)
What is the probability that there will be no December thunderstorms during two out of the next five
years? (ans. 0.267)
Exercise 3-1-11
Geomembrane is often used to provide an effective impervious barrier in a waste containment lining system.
The geomembrane has to be sewn together to cover the entire site; defects can thus occur along the seams.
Consider a landfill construction project that requires 3000 meters of seams and the quality of the seaming
operation is such that defects will occur along the seams at a mean rate of one per 200 meters. The
geomembrane layer is inspected after the installation and those defects that are detected will be repaired.
However, some of the defects will not be detected during the inspection; they will remain and can cause
unsatisfactory performance of the lining system. Suppose the current inspection procedure fails to detect 20% of
the defects.
(a)
What is the mean rate of defects along the seams, that remain in the system after the inspection? (ans.
0.001 per meter)
(b)
Assume that the defects, that remain undetected, occur according to a Poisson process. What is the
probability that there will be more than two defects remaining in the lining system? (ans. 0.577)
(c)
Consider a similar but smaller project involving only 1000 meters of seams. However defects in the
geomembrane seams are very undesirable for this project. It is required to achieve a 95% probability that
the geomembrane lining system will be free of defects after the inspection. Assume that the quality of
seaming operation is same as earlier (i.e. same mean rate of defects before inspection), but the inspection
effort can be improved to reduce the percent of undetected defects. What is the allowable fraction of
undetected defects for this improved inspection procedure? (ans. 1%)
Download