Omar Atef Mohamed Abdelrahman Arab Electromechanical Program Applied Engineering Probability and Mathematical Statistics (FRB 201) Discrete Probability Distribution Introduction: Discrete probability distribution is a type of probability distribution that shows all possible values of a discrete random variable along with the associated probabilities. In other words, a discrete probability distribution gives the likelihood of occurrence of each possible value of a discrete random variable. Geometric distributions, binomial distributions, and Bernoulli distributions are some commonly used discrete probability distributions. This article sheds light on the definition of a discrete probability distribution, its formulas, types, and various associated examples. What is Discrete Probability Distribution? A discrete probability distribution and a continuous probability distribution are two types of probability distributions that define discrete and continuous random variables respectively. A probability distribution can be defined as a function that describes all possible values of a random variable as well as the associated probabilities. Discrete Probability Distribution Definition A discrete probability distribution can be defined as a probability distribution giving the probability that a discrete random variable will have a specified value. Such a distribution will represent data that has a finite countable number of outcomes. There are two conditions that a discrete probability distribution must satisfy. These are given as follows: • 0 ≤ P(X = x) ≤ 1. This implies that the probability of a discrete random variable, X, taking on an exact value, x, lies between 0 and 1. • ∑P(X = x) =1. The sum of all probabilities must be equal to 1. 2 Binomial Probability Distribution In binomial probability distribution, the number of ‘Success’ in a sequence of n experiments, where each time a question is asked for yes-no, then the boolean-valued outcome is represented either with success/yes/true/one (probability p) or failure/no/false/zero (probability q = 1 − p). A single success/failure test is also called a Bernoulli trial or Bernoulli experiment, and a series of outcomes is called a Bernoulli process. For n = 1, i.e. a single experiment, the binomial distribution is a Bernoulli distribution. The binomial distribution is the base for the famous binomial test of statistical importance. Negative Binomial Distribution In probability theory and statistics, the number of successes in a series of independent and identically distributed Bernoulli trials before a particularised number of failures happens. It is termed as the negative binomial distribution. Here the number of failures is denoted by ‘r’. For instance, if we throw a dice and determine the occurrence of 1 as a failure and all non-1’s as successes. Now, if we throw a dice frequently until 1 appears the third time, i.e., r = three failures, then the probability distribution of the number of non-1s that arrived would be the negative binomial distribution. Binomial Distribution Examples As we already know, binomial distribution gives the possibility of a different set of outcomes. In real life, the concept is used for: • • • • Finding the quantity of raw and used materials while making a product. Taking a survey of positive and negative reviews from the public for any specific product or place. By using the YES/ NO survey, we can check whether the number of persons views the particular channel. To find the number of male and female employees in an organisation. 3 • The number of votes collected by a candidate in an election is counted based on 0 or 1 probability. Binomial Distribution Vs Normal Distribution The main difference between the binomial distribution and the normal distribution is that binomial distribution is discrete, whereas the normal distribution is continuous. It means that the binomial distribution has a finite amount of events, whereas the normal distribution has an infinite number of events. In case, if the sample size for the binomial distribution is very large, then the distribution curve for the binomial distribution is like the normal distribution curve. Properties of Binomial Distribution The properties of the binomial distribution are: • • • • • There are two possible outcomes: true or false, success or failure, yes or no. There is ‘n’ number of independent trials or a fixed number of n times repeated trials. The probability of success or failure remains the same for each trial. Only the number of success is calculated out of n independent trials. Every trial is an independent trial, which means the outcome of one trial does not affect the outcome of another trial. 4 Geometric Distribution. • Introduction A. Definition of the Geometric Distribution 1. A discrete probability distribution for modeling the number of Bernoulli trials needed to achieve the first success. 2. Often applied to situations with a binary outcome, such as success/failure or heads/tails. • Characteristics of the Geometric Distribution A. Single Trial Until First Success 1. Each trial is independent. 2. The probability of success remains constant across all trials. • Interpretation of Parameters A. p (Probability of Success) 1. The likelihood of success on each trial. 2. Determines the shape of the distribution. • Examples and Applications A. Tossing a Coin 1. Modeling the number of tosses until the first head. B. Manufacturing Defects 1. Analyzing the number of products produced until the first defective item. C. Waiting Time 1. Modeling the time until the first car arrives at a traffic signal. • Properties and Relationships A. Memorylessness Property 1. The probability of success in the next trial remains the same, regardless of past outcomes. B. Relationship with Exponential Distribution 1. The geometric distribution is related to the exponential distribution in continuous time. • Real-world Applications A. Queuing Theory 1. Analyzing waiting times until the first customer arrives. B. Reliability Engineering 1. Estimating the time until the first failure in a system. • Simulation and Modeling A. Monte Carlo Simulations 1. Simulating real-world scenarios using the geometric distribution. B. Applications in Risk Analysis 1. Assessing risks associated with waiting times. 5 • Challenges and Limitations A. Assumes Independence 1. The trials must be independent for the model to be accurate. B. Continuous-Time Variations 1. Real-world scenarios may involve continuous time, requiring adaptations. Negative Binomial Distribution A. Definition and Characteristics 1. Repeated Trials Until k Successes: • The negative binomial distribution models the number of independent and identical Bernoulli trials required to achieve k successes. • Unlike the geometric distribution, where we focus on the first success, the negative binomial allows for modeling the number of trials until a specified number (k) of successes occurs. 2. Independence: • Each trial is independent of the others. The outcome of one trial does not affect the outcome of another. 3. Constant Probability of Success: • The probability of success (p) remains constant across all trials. • This probability is the same as in the corresponding geometric distribution. 6 B. Probability Mass Function (PMF) • Interpretation of Parameters (r,p): • r (Number of successes): The specified number of successes (k) to be achieved. • p (Probability of Success): The probability of success on each trial. C. Contrasting with Binomial Distribution • Binomial Distribution: • Models the number of successes in a fixed number (n) of trials. • In contrast, negative binomial allows for modeling the number of trials until a specified number of successes. D. Examples and Applications • Number of Failures Before k Successes: • In quality control, it could represent the number of defective products produced before k non-defective ones. • In a series of medical tests, it might signify the number of failed diagnoses before k correct ones. • In sports, it could represent the number of failed attempts before a team scores k goals. The negative binomial distribution is a versatile tool in modeling situations where the focus is on the number of trials needed to achieve a specific number of successes, providing a more flexible framework than the binomial distribution for certain scenarios. 7 Simulation and Modeling A. Monte Carlo Simulations Monte Carlo simulations involve using random sampling techniques to model the probability distribution of an uncertain outcome. In the context of discrete probability distributions like the negative binomial distribution, Monte Carlo simulations are employed to generate a large number of random samples and observe the resulting distribution. This method is particularly useful for complex systems and situations where analytical solutions are challenging to obtain. B. Applications in Risk Analysis 1. Financial Markets: • • Assessing the risk associated with investment portfolios by modeling the distribution of potential returns. Evaluating the impact of market uncertainties on financial instruments. 2. Project Management: • • Estimating project completion times and identifying potential bottlenecks. Analyzing the risk of delays and cost overruns. 3. Healthcare: • • Modeling the spread of diseases and assessing the effectiveness of different intervention strategies. Analyzing the risk of epidemics and pandemics. 4. Environmental Sciences: • • Evaluating the risk of natural disasters such as floods, earthquakes, or hurricanes. Assessing the impact of climate change on ecosystems. C. Software and Tools for Simulation 1. : An open-source statistical programming language with extensive packages for probability distributions and simulation. 8 2. Python: • Popular for its scientific computing libraries, such as NumPy and SciPy, which facilitate simulation studies. 3. MATLAB: • Widely used for mathematical modeling and simulations, offering toolboxes for probability and statistics. 4. Monte Carlo Simulation Software: • Dedicated software like @Risk, Crystal Ball, and Simul8 that streamline the process of setting up and running Monte Carlo simulations. 5. Challenges and Limitations A. Practical Issues in Applying Discrete Probability Distributions 1. Complexity of Real-world Scenarios: • Simulating real-world situations accurately can be challenging due to the complexity of systems and interactions. 2. Resource Intensive: • Running a large number of simulations for precision may require significant computational resources. B. Sensitivity to Assumptions 1. Model Assumptions: • The accuracy of simulations heavily depends on the assumptions made during modeling, and deviations can impact results. 2. Parameter Estimation: • The precision of parameter estimates for the probability distribution directly influences the reliability of simulations. 9 C. Data Requirements and Availability 1. Limited Data: • Insufficient data can hinder the accurate estimation of distribution parameters, leading to unreliable simulations. 2. Data Quality: • Poor-quality data may introduce biases and inaccuracies in the simulations. 3. Data Availability: In some cases, obtaining relevant data for specific scenarios may be challenging or impossible. Effectively addressing these challenges requires a careful balance between model complexity, data quality, and the practical constraints of the simulation process. Sensitivity analyses and robustness testing are crucial to understanding the reliability of the simulation results. • Continuous Probability Distributions The normal distribution, also known as the Gaussian distribution or bell curve, is a continuous probability distribution that is symmetric around its mean, μ. It is characterized by its probability density function (PDF) 1. Symmetry: The normal distribution is symmetric around its mean (μ), meaning that the left and right tails are identical. 2. Bell-shaped curve: The probability density function forms a bell-shaped curve, with the highest point at the mean. 3. Mean and standard deviation: The mean (μ) specifies the center of the distribution, and the standard 10 deviation (σ) controls the spread or dispersion. About 68% of the data falls within one standard deviation from the mean, about 95% within two standard deviations, and about 99.7% within three standard deviations • Area under the curve: The total area under the curve is equal to 1, representing the probability that a random variable falls within the entire range of possible values. • Central Limit Theorem: The sum (or average) of a large number of independent and identically distributed random variables, regardless of the original distribution, will be approximately normally distributed. • The normal distribution is widely used in statistics and probability theory due to its mathematical tractability and its emergence in many natural phenomena. Many statistical methods assume that the underlying data follows a normal distribution, and this assumption is often reasonable due to the Central Limit Theorem. 11 Examples: 12 13