Topic № 5 Discrete distributions After reading Topic № 5, you will be able to: computing the probability of an event; predicting the real processes by binomial and Poisson distributions; carry out quality control; predicting technological accidents, security breaches, technological risk, etc. A discrete or discontinuous is a quantitative variable that can accept a limited finite number of values within a certain interval. This type of variable takes only certain values such as countable values (integers), dichotomous (only two opposites values). For example, number of employees, companies, vehicles, population in a given territory, everything expressed in a given currency, etc. All category (qualitative) variables and some of the quantitative ones are discrete. The discrete probability distributions that are used in modern practice are: binomial distribution, Poisson distribution, Hypergeometric distribution, Negative binomial distribution, multinomial distribution and others. In this topic, the Binomial and Poisson distributions will be discussed in detail, illustrating their practical applications. 1. Binomial distribution The binomial is one of the most commonly used theoretical distributions in various areas of real life after the normal and lognormal distribution. It is a distribution of a random dichotomous variable (which can take only 2 meanings). In probability theory, it is also known as the Bernoulli distribution. It is used in the finance, insurance, business practice, experimental research, gambling activities, etc. Features: a discontinuous (discrete), unimodal theoretical distribution; X can only accept positive integers; It is defined by only two parameters - probability (P) and number of attempts (n); It is symmetrical only when the probabilities of the two alternative events are equal, i.e., p = q = 0.5; when p < 0.5 positive (right) asymmetry is observed, which increases with decreasing p; when p > 0.5 negative (left) asymmetry is observed, which increases with increasing p; the probability of success is the same for all attempts; the area under the PDF curve regardless of the values of P and n is always equal to 1. It is appropriate to apply the binomial distribution, when in real processes or experiments when only two alternatives are possible: positive or negative, profit or loss, success or failure, even or odds, etc. It is important to note that a large number of experiments should be possible and the probabilities do not change with their increase. Practical example 1 You decide to play roulette with the intention of making 3 attempts by betting only on an odd number. What is the probability of falling 0, 1, 2, 3 times an odd number? Solution Two equally possible results are possible with each bet - even or odd. For one result probability is p = ½, and for the other q = (1-p) = ½, because the sum of all probabilities is always equal to 1, i.e. p + q = 1. The possible results for three bets are: 1. to occur an even number, i.e. 0 odds 2. to occur 1 odd number; 3. to occur 2 odd numbers; 4. to occur 3 odd numbers. Table 3.1 illustrates the distribution of possible combinations of the three betting attempts. Table 3.1 Determining the combinations combinations 1 2 3 4 5 6 7 8 first bet second bet third bet even even even odd even odd odd odd even even odd even odd even odd odd even odd even even odd odd even odd number of odd 0 1 1 1 2 2 2 3 Table 3.2 presents the process of calculating the probabilities for each of the results referred to in Table 3.1. The probability is calculated by dividing the number of favorable cases (odd) by all possible cases. Table 3.2 Calculation of probabilities Possible favorable results (values of the random variable) 0 1 2 3 amount Probability 1/8 = 0.125 3/8 = 0.375 3/8 = 0.375 1/8 = 0.125 1.00 Based on the data in Table 3.2, the form of the probability density function of the binomial distribution is visualized. The histogram is shown in Graph 3.1. Graph 3.1 The analytical expression of the probability density function of the binomial distribution has the following form: PDF(𝑥) = 𝐶𝑁𝑥 𝑝 𝑥 𝑞 𝑁−𝑥 where: X is a random discrete variable representing the number of positive/negative results from the N experiment; N - the number of trials; 𝐶𝑁𝑥 - combinations of N elements of class X. p - the probability of a positive result q - the probability of a negative result. The number of combinations of N elements of class X is calculated by the following formula: 𝑁! 𝐶𝑁𝑥 = 𝑥!(𝑁−𝑥)! , The probability of a negative result is q = 1 - p. After substituting the formula of the function of the probability density of the binomial distribution in the expanded form acquires the following form. 𝑁! PDF(𝑥) = 𝑝 𝑥 (1 − 𝑝)𝑁−𝑥 (𝑁 𝑥! − 𝑥)! This formula is known as Bernoulli's formula, named after the famous 17th century Swiss mathematician Jakob Bernoulli, who is considered one of the founders of probability theory. It can be used to calculate the probabilities of all possible outcomes when we have only two possible alternatives. Practical example 2 We will use the data from the previous example to calculate what is the probability that if we bet 3 times on an odd number of roulettes, it will occur 2 times? Solution We replace the values in the formula of the function of probability density of the binomial distribution as follows: 3.2.1 PDF(𝑥) = 0,52 (1 − 0,5)1 = 0,375 (2.1).(1) i.e. the probability is 37.7%, which was calculated alternatively in row 4 of Table 3.2. For convenience, the Excel formula BINOMDIST for the binomial distribution probability density function can be used Figure 3.1. Figure 3.1 Probability calculation using BINOMDIST BINOMDIST's Excel statistical function can calculate the probability using both PDF and cumulative probability using (CDF) in binomial distribution. In practical example 2 we want to calculate the probability of an event and therefore it is necessary in the last cell of the dialog box against cumulative to write FALSE ( or 0), so the computer understands that it must use the probability density function (Figure 3.1). Practical example 3 What is the probability that if you toss a coin 5 times it will occur 3 times heads? Answer 5.4.3.2.1 PDF(𝑥) = 0,53 (1 − 0,5)2 = 0,3125 (3.2.1)(2.1) Using Excel's binomial distribution PDF formula the probability calculation is as follows: = BINOMDIST (3; 5; 0.5; FALSE) = 0.3125 i.e. the probability to occur 3 times in 5 rounds is 31.25% The form of the probability function of the binomial distribution depends on the probability p and the number of experiments n. In order to visualize the different shapes, the BINOMDIST function of Excel was used. Graph 3.2 shows the shape of the binomial distribution with probability p = 0.10 and n=10. It can be seen that a strong asymmetry is observed on the right. Graph 3.2 PDF at p = 0.10 and n = 10 0,45 0,4 0,35 0,3 0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10 11 Graph 3.3 shows that the right asymmetry changes from strong to moderate with increasing probability. Graph 3.3 PDF at p = 0.25 and n = 10 0,3 0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10 11 It can be concluded that when the probability of a favorable result is less than 0.5, right asymmetry is observed. If both probabilities are equal, we have symmetry, and if the probability is above this value, left asymmetry is observed. Figure 3.4 shows that the binomial distribution is symmetric when the probability of success is equal to the probability of failure. Graph 3.4 PDF at p = 0.5 and n = 10 0,3 0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10 11 Graph 3.5 shows that when the probability of a favorable result is higher than that of an unfavorable one, a left asymmetry is observed. Graph 3.5 PDF at p = 0.7 and n = 10 0,3 0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10 11 Practical example 4 A defective video card has been installed in the last batch of 80 computers in a computer assembly company. a) A customer wants to buy 5 pieces immediately. What is the probability that in one of them will occur a defective board? b) What is the probability that out of 30 computers sold, the defective computers are not more than 4? Solution a) The probability of defective computers is 12/80 = 0.15. Then we calculate the probability using the BNOMDIST function of Excel, which calculates a PDF for binomial distribution: Figure 3.2 The probability that a computer has a defective board is 39.15%. b) In this case we do not have to find what probability corresponds to a certain value, but to calculate the probability that the values are less than a given value, i.e., we must use the cumulative CDF function of the binomial distribution. Figure 3.3 To calculate the cumulative probability in the dialog box of the BINOMDIST function in the last cell we have to write TRUE or 1 (figure 3.3.). As could be seen from figure 3.3, the probability of 30 computers sold defective is not more than 4 is 0.5245, i.e. 52,45%. Calculating the moments of the binomial distribution In the Binomial distribution, the arithmetic mean of a random variable (mathematical expectation) is calculated by the following formula: 𝜇=𝑝 i.e. the arithmetic mean is the relative proportion (probability) of a positive / negative outcome. The standard deviation is calculated as: 𝜎 = √𝑝 ∗ 𝑞 = √𝑝 ∗ (1 − 𝑝) Covariance is equal to: Cov𝑋,𝑌 = 𝜎𝑋 ∗ 𝜎𝑌 ∗ 𝜌𝑋,𝑌 , where ρx, y is the correlation coefficient between X and Y. 2. Poisson distribution The Poisson distribution was named after the French mathematician Simeon Denis Poisson, who presented it to the general public in 18371. It is a limiting form of the binomial distribution when p tends to 0 and n increases indefinitely. The Poisson distribution is the probability distribution of a discrete random variable that refers to a number of statistically independent events occurring within a unit of time or space. It is used in practice for quality control and risk measurement, i.e. when measuring the number (frequency) of an event to occur for a certain period of time (e.g. accidents per month, calls per hour, errors per 1000 transactions, landing of aircraft for time, etc.) Features: A discontinuous (discrete), unimodal theoretical distribution; it is defined by only one parameter, because the arithmetic mean and the variance coincide, i.e. µ = σ2 = λ 1 Letkowski J., Applications of the Poisson probability distribution, Western New England University, 2012 the closer λ is to 0, the greater the right asymmetry and the inverse J distribution; about λ = 7 an approximately symmetrical distribution is observed and with the increase of the parameter the left asymmetry increases and approaches the J distribution; X can only accept positive integers; the area under the PDF curve, regardless of the values of λ, is always equal to 1. The probability density function of the Poisson distribution is calculated by the following formula: (−𝜆) 𝑥𝑒 PDF(𝑋 = 𝑥) = 𝜆 , 𝑋! where: λ is a parameter that is responsible for the shape and location of the distribution e is a mathematical constant known as Euler's number (not to be confused with Euler's constant), after the Swiss mathematician Leonhard Euler, or Napier's constant, which is equal to 2.71828182845904. The Excel Poisson distribution PDF function is programmed as follows: = POISSON (X; λ; FALSE) Practical example 5 If an average of 4 accidents per month are observed in an operational process, what is the probability that 6 will occur? Solution 𝑒 (−𝜆) 2, 72−4 6 PDF(𝑋 = 6) = 𝜆 =4 = 0,1042 𝑋! 6! 𝑥 Figure 3.4 Calculating using the Excel POISSON function: Practical example 6 If an average of 64 breaches in the bank internal network occur for a year, what is the probability that 90 will occur? Solution The probability of the breaches to increase to 90 next year is only 0.039%.The form of the PDF function in the Poisson distribution depends only on the parameter λ. Graph 3.6 Poisson distribution at λ = 0.5 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Graph 3.6 visualizes the shape of the Poisson distribution at λ = 0.5. In this case, an extremely asymmetrically right-hand distribution is observed. Graph 3.7 Poisson distribution at λ = 3 0,25 0,2 0,15 0,1 0,05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 As the values of λ increase, the right asymmetry gradually changes from extreme to moderate (Figure 3.7). Graph 3.8 Poisson distribution at λ = 7 0,16 0,14 0,12 0,1 0,08 0,06 0,04 0,02 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 A relatively symmetric distribution is observed at a value of the parameter λ around 7 of graph 3.7. Graph 3.9 Poisson distribution at λ = 15 0,12 0,1 0,08 0,06 0,04 0,02 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 As the value of λ increases above 7, the symmetry begins to increase in the opposite direction, and at values above 15 the distribution gradually becomes extreme left asymmetric (Figure 3.9).