Uploaded by Omar Atef

Probability Report

advertisement
Omar Atef Mohamed Abdelrahman Arab
Electromechanical Program
Applied Engineering Probability and
Mathematical Statistics (FRB 201)
Discrete Probability Distribution
Introduction:
Discrete probability distribution is a type of probability distribution that
shows all possible values of a discrete random variable along with the
associated probabilities. In other words, a discrete probability
distribution gives the likelihood of occurrence of each possible value of a
discrete random variable.
Geometric distributions, binomial distributions, and Bernoulli
distributions are some commonly used discrete probability distributions.
This article sheds light on the definition of a discrete probability
distribution, its formulas, types, and various associated examples.
What is Discrete Probability Distribution?
A discrete probability distribution and a continuous probability
distribution are two types of probability distributions that define discrete
and continuous random variables respectively. A probability distribution
can be defined as a function that describes all possible values of a random
variable as well as the associated probabilities.
Discrete Probability Distribution Definition
A discrete probability distribution can be defined as a probability
distribution giving the probability that a discrete random variable will have
a specified value. Such a distribution will represent data that has a finite
countable number of outcomes. There are two conditions that a discrete
probability distribution must satisfy. These are given as follows:
•
0 ≤ P(X = x) ≤ 1. This implies that the probability of a discrete random variable,
X, taking on an exact value, x, lies between 0 and 1.
•
∑P(X = x) =1. The sum of all probabilities must be equal to 1.
2
Binomial Probability Distribution
In binomial probability distribution, the number of ‘Success’ in a sequence of
n experiments, where each time a question is asked for yes-no, then the
boolean-valued outcome is represented either with success/yes/true/one
(probability p) or failure/no/false/zero (probability q = 1 − p). A single
success/failure test is also called a Bernoulli trial or Bernoulli experiment,
and a series of outcomes is called a Bernoulli process. For n = 1, i.e. a single
experiment, the binomial distribution is a Bernoulli distribution. The
binomial distribution is the base for the famous binomial test of statistical
importance.
Negative Binomial Distribution
In probability theory and statistics, the number of successes in a series of
independent and identically distributed Bernoulli trials before a
particularised number of failures happens. It is termed as the negative
binomial distribution. Here the number of failures is denoted by ‘r’. For
instance, if we throw a dice and determine the occurrence of 1 as a failure
and all non-1’s as successes. Now, if we throw a dice frequently until 1
appears the third time, i.e., r = three failures, then the probability distribution
of the number of non-1s that arrived would be the negative binomial
distribution.
Binomial Distribution Examples
As we already know, binomial distribution gives the possibility of a different
set of outcomes. In real life, the concept is used for:
•
•
•
•
Finding the quantity of raw and used materials while making a
product.
Taking a survey of positive and negative reviews from the public for
any specific product or place.
By using the YES/ NO survey, we can check whether the number of
persons views the particular channel.
To find the number of male and female employees in an organisation.
3
•
The number of votes collected by a candidate in an election is counted
based on 0 or 1 probability.
Binomial Distribution Vs Normal Distribution
The main difference between the binomial distribution and the normal
distribution is that binomial distribution is discrete, whereas the normal
distribution is continuous. It means that the binomial distribution has a finite
amount of events, whereas the normal distribution has an infinite number of
events. In case, if the sample size for the binomial distribution is very large,
then the distribution curve for the binomial distribution is like the normal
distribution
curve.
Properties of Binomial Distribution
The properties of the binomial distribution are:
•
•
•
•
•
There are two possible outcomes: true or false, success or failure, yes
or no.
There is ‘n’ number of independent trials or a fixed number of n times
repeated trials.
The probability of success or failure remains the same for each trial.
Only the number of success is calculated out of n independent trials.
Every trial is an independent trial, which means the outcome of one
trial does not affect the outcome of another trial.
4
Geometric Distribution.
• Introduction A. Definition of the Geometric Distribution 1.
A discrete probability distribution for modeling the number
of Bernoulli trials needed to achieve the first success. 2.
Often applied to situations with a binary outcome, such as
success/failure or heads/tails.
• Characteristics of the Geometric Distribution A. Single
Trial Until First Success 1. Each trial is independent. 2. The
probability of success remains constant across all trials.
• Interpretation of Parameters A. p (Probability of Success) 1. The
likelihood of success on each trial. 2. Determines the shape of the
distribution.
• Examples and Applications A. Tossing a Coin 1. Modeling the
number of tosses until the first head. B. Manufacturing Defects 1.
Analyzing the number of products produced until the first
defective item. C. Waiting Time 1. Modeling the time until the first
car arrives at a traffic signal.
• Properties and Relationships A. Memorylessness Property 1.
The probability of success in the next trial remains the same,
regardless of past outcomes. B. Relationship with Exponential
Distribution 1. The geometric distribution is related to the
exponential distribution in continuous time.
• Real-world Applications A. Queuing Theory 1. Analyzing waiting
times until the first customer arrives. B. Reliability Engineering 1.
Estimating the time until the first failure in a system.
• Simulation and Modeling A. Monte Carlo Simulations 1.
Simulating real-world scenarios using the geometric distribution.
B. Applications in Risk Analysis 1. Assessing risks associated with
waiting times.
5
• Challenges and Limitations A. Assumes Independence 1. The
trials must be independent for the model to be accurate. B.
Continuous-Time Variations 1. Real-world scenarios may involve
continuous time, requiring adaptations.
Negative Binomial Distribution
A. Definition and Characteristics
1. Repeated Trials Until k Successes:
•
The negative binomial distribution models the number of
independent and identical Bernoulli trials required to
achieve k successes.
•
Unlike the geometric distribution, where we focus on the
first success, the negative binomial allows for modeling the
number of trials until a specified number (k) of successes
occurs.
2. Independence:
•
Each trial is independent of the others. The outcome of one
trial does not affect the outcome of another.
3. Constant Probability of Success:
•
The probability of success (p) remains constant across all
trials.
•
This probability is the same as in the corresponding
geometric distribution.
6
B. Probability Mass Function (PMF)
• Interpretation of Parameters (r,p):
•
r (Number of successes): The specified number of successes
(k) to be achieved.
•
p (Probability of Success): The probability of success on
each trial.
C. Contrasting with Binomial Distribution
•
Binomial Distribution:
•
Models the number of successes in a fixed number (n) of trials.
•
In contrast, negative binomial allows for modeling the number of
trials until a specified number of successes.
D. Examples and Applications
• Number of Failures Before k Successes:
•
In quality control, it could represent the number of
defective products produced before k non-defective ones.
•
In a series of medical tests, it might signify the number of
failed diagnoses before k correct ones.
•
In sports, it could represent the number of failed attempts
before a team scores k goals.
The negative binomial distribution is a versatile tool in modeling
situations where the focus is on the number of trials needed to achieve
a specific number of successes, providing a more flexible framework
than the binomial distribution for certain scenarios.
7
Simulation and Modeling
A. Monte Carlo Simulations
Monte Carlo simulations involve using random sampling techniques to
model the probability distribution of an uncertain outcome. In the
context of discrete probability distributions like the negative binomial
distribution, Monte Carlo simulations are employed to generate a large
number of random samples and observe the resulting distribution. This
method is particularly useful for complex systems and situations where
analytical solutions are challenging to obtain.
B. Applications in Risk Analysis
1. Financial Markets:
•
•
Assessing the risk associated with investment portfolios by
modeling the distribution of potential returns.
Evaluating the impact of market uncertainties on financial
instruments.
2. Project Management:
•
•
Estimating project completion times and identifying
potential bottlenecks.
Analyzing the risk of delays and cost overruns.
3. Healthcare:
•
•
Modeling the spread of diseases and assessing the
effectiveness of different intervention strategies.
Analyzing the risk of epidemics and pandemics.
4. Environmental Sciences:
•
•
Evaluating the risk of natural disasters such as floods,
earthquakes, or hurricanes.
Assessing the impact of climate change on ecosystems.
C. Software and Tools for Simulation
1. : An open-source statistical programming language with
extensive packages for probability distributions and simulation.
8
2. Python:
•
Popular for its scientific computing libraries, such as NumPy
and SciPy, which facilitate simulation studies.
3. MATLAB:
•
Widely used for mathematical modeling and simulations,
offering toolboxes for probability and statistics.
4. Monte Carlo Simulation Software:
•
Dedicated software like @Risk, Crystal Ball, and Simul8 that
streamline the process of setting up and running Monte
Carlo simulations.
5. Challenges and Limitations
A. Practical Issues in Applying Discrete Probability
Distributions
1. Complexity of Real-world Scenarios:
•
Simulating real-world situations accurately can be challenging
due to the complexity of systems and interactions.
2. Resource Intensive:
•
Running a large number of simulations for precision may require
significant computational resources.
B. Sensitivity to Assumptions
1. Model Assumptions:
•
The accuracy of simulations heavily depends on the
assumptions made during modeling, and deviations can
impact results.
2. Parameter Estimation:
•
The precision of parameter estimates for the probability
distribution directly influences the reliability of simulations.
9
C. Data Requirements and Availability
1. Limited Data:
•
Insufficient data can hinder the accurate estimation of
distribution parameters, leading to unreliable simulations.
2. Data Quality:
•
Poor-quality data may introduce biases and inaccuracies in
the simulations.
3. Data Availability:
In some cases, obtaining relevant data for specific scenarios
may be challenging or impossible.
Effectively addressing these challenges requires a careful balance
between model complexity, data quality, and the practical constraints
of the simulation process. Sensitivity analyses and robustness testing
are crucial to understanding the reliability of the simulation results.
•
Continuous Probability Distributions
The normal distribution, also known as the Gaussian
distribution or bell curve, is a continuous probability
distribution that is symmetric around its mean, μ. It is
characterized by its probability density function (PDF)
1. Symmetry: The normal distribution is symmetric
around its mean (μ), meaning that the left and right
tails are identical.
2. Bell-shaped curve: The probability density function
forms a bell-shaped curve, with the highest point at the
mean.
3. Mean and standard deviation: The mean (μ) specifies
the center of the distribution, and the standard
10
deviation (σ) controls the spread or dispersion. About
68% of the data falls within one standard deviation
from the mean, about 95% within two standard
deviations, and about 99.7% within three standard
deviations
• Area under the curve: The total area under the curve is
equal to 1, representing the probability that a random
variable falls within the entire range of possible values.
• Central Limit Theorem: The sum (or average) of a large
number of independent and identically distributed
random variables, regardless of the original
distribution, will be approximately normally
distributed.
• The normal distribution is widely used in statistics and
probability theory due to its mathematical tractability
and its emergence in many natural phenomena. Many
statistical methods assume that the underlying data
follows a normal distribution, and this assumption is
often reasonable due to the Central Limit Theorem.
11
Examples:
12
13
Download