Uploaded by Georgi Georgiev

Discrete Distributions: Binomial & Poisson

advertisement
Topic № 5
Discrete distributions
After reading Topic № 5, you will be able to:
 computing the probability of an event;
 predicting the real processes by binomial and
Poisson distributions;
 carry out quality control;
 predicting technological accidents, security
breaches, technological risk, etc.
A
discrete or discontinuous is a quantitative
variable that can accept a limited finite number of values within a certain interval. This
type of variable takes only certain values such as countable
values (integers), dichotomous (only two opposites values).
For example, number of employees, companies, vehicles,
population in a given territory, everything expressed in a
given currency, etc. All category (qualitative) variables and
some of the quantitative ones are discrete.
The discrete probability distributions that are used in
modern practice are: binomial distribution, Poisson distribution, Hypergeometric distribution, Negative binomial distribution, multinomial distribution and others. In this topic, the
Binomial and Poisson distributions will be discussed in detail, illustrating their practical applications.
1. Binomial distribution
The binomial is one of the most commonly used theoretical distributions in various areas of real life after the
normal and lognormal distribution. It is a distribution of a
random dichotomous variable (which can take only 2 meanings). In probability theory, it is also known as the Bernoulli
distribution. It is used in the finance, insurance, business
practice, experimental research, gambling activities, etc.








Features:
a discontinuous (discrete), unimodal theoretical distribution;
X can only accept positive integers;
It is defined by only two parameters - probability (P)
and number of attempts (n);
It is symmetrical only when the probabilities of the
two alternative events are equal, i.e., p = q = 0.5;
when p < 0.5 positive (right) asymmetry is observed,
which increases with decreasing p;
when p > 0.5 negative (left) asymmetry is observed,
which increases with increasing p;
the probability of success is the same for all attempts;
the area under the PDF curve regardless of the values of P and n is always equal to 1.
It is appropriate to apply the binomial distribution,
when in real processes or experiments when only two alternatives are possible: positive or negative, profit or loss, success or failure, even or odds, etc. It is important to note that
a large number of experiments should be possible and the
probabilities do not change with their increase.
Practical example 1
You decide to play roulette with the intention of making 3 attempts by betting only on an odd number. What is
the probability of falling 0, 1, 2, 3 times an odd number?
Solution
Two equally possible results are possible with each
bet - even or odd. For one result probability is p = ½, and
for the other q = (1-p) = ½, because the sum of all probabilities is always equal to 1, i.e. p + q = 1.
The possible results for three bets are:
1. to occur an even number, i.e. 0 odds
2. to occur 1 odd number;
3. to occur 2 odd numbers;
4. to occur 3 odd numbers.
Table 3.1 illustrates the distribution of possible combinations of the three betting attempts.
Table 3.1
Determining the combinations
combinations
1
2
3
4
5
6
7
8
first bet
second bet
third bet
even
even
even
odd
even
odd
odd
odd
even
even
odd
even
odd
even
odd
odd
even
odd
even
even
odd
odd
even
odd
number
of odd
0
1
1
1
2
2
2
3
Table 3.2 presents the process of calculating the
probabilities for each of the results referred to in Table 3.1.
The probability is calculated by dividing the number of favorable cases (odd) by all possible cases.
Table 3.2
Calculation of probabilities
Possible favorable results
(values of the random variable)
0
1
2
3
amount
Probability
1/8 = 0.125
3/8 = 0.375
3/8 = 0.375
1/8 = 0.125
1.00
Based on the data in Table 3.2, the form of the probability density function of the binomial distribution is visualized. The histogram is shown in Graph 3.1.
Graph 3.1
The analytical expression of the probability density
function of the binomial distribution has the following form:
PDF(𝑥) = 𝐶𝑁𝑥 𝑝 𝑥 𝑞 𝑁−𝑥
where:
X is a random discrete variable representing the
number of positive/negative results from the N experiment;
N - the number of trials;
𝐶𝑁𝑥 - combinations of N elements of class X.
p - the probability of a positive result
q - the probability of a negative result.
The number of combinations of N elements of class
X is calculated by the following formula:
𝑁!
𝐶𝑁𝑥 = 𝑥!(𝑁−𝑥)! ,
The probability of a negative result is q = 1 - p.
After substituting the formula of the function of the
probability density of the binomial distribution in the expanded form acquires the following form.
𝑁!
PDF(𝑥) =
𝑝 𝑥 (1 − 𝑝)𝑁−𝑥
(𝑁
𝑥!
− 𝑥)!
This formula is known as Bernoulli's formula, named
after the famous 17th century Swiss mathematician Jakob
Bernoulli, who is considered one of the founders of probability theory. It can be used to calculate the probabilities of
all possible outcomes when we have only two possible alternatives.
Practical example 2
We will use the data from the previous example to
calculate what is the probability that if we bet 3 times on an
odd number of roulettes, it will occur 2 times?
Solution
We replace the values in the formula of the function
of probability density of the binomial distribution as follows:
3.2.1
PDF(𝑥) =
0,52 (1 − 0,5)1 = 0,375
(2.1).(1)
i.e. the probability is 37.7%, which was calculated alternatively in row 4 of Table 3.2.
For convenience, the Excel formula BINOMDIST for
the binomial distribution probability density function can be
used Figure 3.1.
Figure 3.1
Probability calculation using BINOMDIST
BINOMDIST's Excel statistical function can calculate
the probability using both PDF and cumulative probability
using (CDF) in binomial distribution. In practical example 2
we want to calculate the probability of an event and therefore it is necessary in the last cell of the dialog box against
cumulative to write FALSE ( or 0), so the computer understands that it must use the probability density function (Figure 3.1).
Practical example 3
What is the probability that if you toss a coin 5 times
it will occur 3 times heads?
Answer
5.4.3.2.1
PDF(𝑥) =
0,53 (1 − 0,5)2 = 0,3125
(3.2.1)(2.1)
Using Excel's binomial distribution PDF formula the
probability calculation is as follows:
= BINOMDIST (3; 5; 0.5; FALSE) = 0.3125
i.e. the probability to occur 3 times in 5 rounds is
31.25%
The form of the probability function of the binomial
distribution depends on the probability p and the number of
experiments n. In order to visualize the different shapes, the
BINOMDIST function of Excel was used.
Graph 3.2 shows the shape of the binomial distribution with probability p = 0.10 and n=10. It can be seen that
a strong asymmetry is observed on the right.
Graph 3.2
PDF at p = 0.10 and n = 10
0,45
0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
1
2
3
4
5
6
7
8
9
10
11
Graph 3.3 shows that the right asymmetry changes
from strong to moderate with increasing probability.
Graph 3.3
PDF at p = 0.25 and n = 10
0,3
0,25
0,2
0,15
0,1
0,05
0
1
2
3
4
5
6
7
8
9
10
11
It can be concluded that when the probability of a favorable result is less than 0.5, right asymmetry is observed.
If both probabilities are equal, we have symmetry, and if the
probability is above this value, left asymmetry is observed.
Figure 3.4 shows that the binomial distribution is
symmetric when the probability of success is equal to the
probability of failure.
Graph 3.4
PDF at p = 0.5 and n = 10
0,3
0,25
0,2
0,15
0,1
0,05
0
1
2
3
4
5
6
7
8
9
10
11
Graph 3.5 shows that when the probability of a favorable result is higher than that of an unfavorable one, a
left asymmetry is observed.
Graph 3.5
PDF at p = 0.7 and n = 10
0,3
0,25
0,2
0,15
0,1
0,05
0
1
2
3
4
5
6
7
8
9
10
11
Practical example 4
A defective video card has been installed in the last
batch of 80 computers in a computer assembly company.
a) A customer wants to buy 5 pieces immediately.
What is the probability that in one of them will occur
a defective board?
b) What is the probability that out of 30 computers
sold, the defective computers are not more than 4?
Solution
a) The probability of defective computers is 12/80 =
0.15. Then we calculate the probability using the
BNOMDIST function of Excel, which calculates a PDF for
binomial distribution:
Figure 3.2
The probability that a computer has a defective
board is 39.15%.
b) In this case we do not have to find what probability
corresponds to a certain value, but to calculate the probability that the values are less than a given value, i.e., we
must use the cumulative CDF function of the binomial distribution.
Figure 3.3
To calculate the cumulative probability in the dialog
box of the BINOMDIST function in the last cell we have to
write TRUE or 1 (figure 3.3.). As could be seen from figure
3.3, the probability of 30 computers sold defective is not
more than 4 is 0.5245, i.e. 52,45%.
Calculating the moments of the binomial distribution
In the Binomial distribution, the arithmetic mean of a
random variable (mathematical expectation) is calculated
by the following formula:
𝜇=𝑝
i.e. the arithmetic mean is the relative proportion (probability) of a positive / negative outcome.
The standard deviation is calculated as:
𝜎 = √𝑝 ∗ 𝑞 = √𝑝 ∗ (1 − 𝑝)
Covariance is equal to:
Cov𝑋,𝑌 = 𝜎𝑋 ∗ 𝜎𝑌 ∗ 𝜌𝑋,𝑌 ,
where ρx, y is the correlation coefficient between X and Y.
2. Poisson distribution
The Poisson distribution was named after the French
mathematician Simeon Denis Poisson, who presented it to
the general public in 18371. It is a limiting form of the binomial distribution when p tends to 0 and n increases indefinitely. The Poisson distribution is the probability distribution
of a discrete random variable that refers to a number of statistically independent events occurring within a unit of time
or space.
It is used in practice for quality control and risk
measurement, i.e. when measuring the number (frequency)
of an event to occur for a certain period of time (e.g. accidents per month, calls per hour, errors per 1000 transactions, landing of aircraft for time, etc.)
Features:
 A discontinuous (discrete), unimodal theoretical distribution;
 it is defined by only one parameter, because the
arithmetic mean and the variance coincide, i.e. µ =
σ2 = λ
1
Letkowski J., Applications of the Poisson probability distribution, Western New England University, 2012
 the closer λ is to 0, the greater the right asymmetry
and the inverse J distribution;
 about λ = 7 an approximately symmetrical distribution is observed and with the increase of the parameter the left asymmetry increases and approaches
the J distribution;
 X can only accept positive integers;
 the area under the PDF curve, regardless of the values of λ, is always equal to 1.
The probability density function of the Poisson
distribution is calculated by the following formula:
(−𝜆)
𝑥𝑒
PDF(𝑋 = 𝑥) = 𝜆
,
𝑋!
where: λ is a parameter that is responsible for the shape
and location of the distribution
e is a mathematical constant known as Euler's number
(not to be confused with Euler's constant), after the Swiss
mathematician Leonhard Euler, or Napier's constant,
which is equal to 2.71828182845904.
The Excel Poisson distribution PDF function is programmed as follows:
= POISSON (X; λ; FALSE)
Practical example 5
If an average of 4 accidents per month are observed
in an operational process, what is the probability that 6 will
occur?
Solution
𝑒 (−𝜆)
2, 72−4
6
PDF(𝑋 = 6) = 𝜆
=4
= 0,1042
𝑋!
6!
𝑥
Figure 3.4
Calculating using the Excel POISSON function:
Practical example 6
If an average of 64 breaches in the bank internal network occur for a year, what is the probability that 90 will
occur?
Solution
The probability of the breaches to increase to 90 next
year is only 0.039%.The form of the PDF function in the
Poisson distribution depends only on the parameter λ.
Graph 3.6
Poisson distribution at λ = 0.5
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Graph 3.6 visualizes the shape of the Poisson distribution at λ = 0.5. In this case, an extremely asymmetrically
right-hand distribution is observed.
Graph 3.7
Poisson distribution at λ = 3
0,25
0,2
0,15
0,1
0,05
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
As the values of λ increase, the right asymmetry
gradually changes from extreme to moderate (Figure 3.7).
Graph 3.8
Poisson distribution at λ = 7
0,16
0,14
0,12
0,1
0,08
0,06
0,04
0,02
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
A relatively symmetric distribution is observed at a
value of the parameter λ around 7 of graph 3.7.
Graph 3.9
Poisson distribution at λ = 15
0,12
0,1
0,08
0,06
0,04
0,02
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
As the value of λ increases above 7, the symmetry
begins to increase in the opposite direction, and at values
above 15 the distribution gradually becomes extreme left
asymmetric (Figure 3.9).
Download