Discrete Probability Distributions Reference

advertisement
Discrete Probability Distributions
Reference
A CEE3030 lecture prepared by
The subjects presented are taken from the Maple
worksheed entitled
Gilberto E. Urroz
DiscreteProbabilityDistributions
February 2006
available for download in the class schedule
Quick review of concepts for discrete
random variables - 1
● Let X be a discrete random variable, then
–
–
Quick review of concepts for discrete
random variables - 2
●
f(x) = P(X=x) is the probability mass function (pmf)
F(x) = P(X≤x) =
f u = cumulative
∑
u≤ x
distribution function (CDF)
Let X be a discrete random variable, then
–
●
f(x) = P(X=x) is the probability mass function (pmf)
Calculation of measures
n
–
Mean,
Calculation of probabilities
●
–
–
–
–
–
–
P(X < x) = F(x-1)
-- P(X≤ x) = F(x)
P(X > x) = 1-F(x)
-- P(X≥ x) =1-F(x-1)
P(a < X < b) = F(b-1)-F(a)
P(a ≤ X< b) = F(b-1)-F(a-1)
P(a < X ≤ b) = F(b)-F(a)
P(a ≤ X ≤ b) = F(b)-F(a-1)
Discrete distributions in Maple
●
●
Use the command: ?Statistics,Distributions for a
list of available distributions
Discrete distributions of interest are:
Bernoulli
Binomial
●DiscreteUniform
●EmpiricalDistribution
●Geometric
●Hypergeometric
●NegativeBinomial
●Poisson
●ProbabilityTable
●
●
Bernoulli distribution
binomial distribution
discrete uniform distribution
empirical distribution
geometric distribution
hypergeometric distribution
negative binomial (Pascal) dist.
Poisson distribution
probability table
–
Variance,
–
Skewness
–
Kurtosis
=∑ x i⋅f  x i 
i=1
n
 =∑  x i −2⋅f  x i 
2
i =1
n
1
3
 x i − ⋅ f  x i 
3∑
 i =1
n
1
4= 4 ∑  x i −4⋅ f  x i 
 i =1
 3=
Using Maple Statistics package to
define a discrete random variable
●
To load the Statistics package use: with(Statistics)
●
Use ?<distribution name> for help
–
●
e.g., ?Geometric
Define a random variable with distribution name
and appropriate parameters with function
RandomVariable
–
–
e.g., X := RandomVariable(Binomial(n,p))
e.g., X := RandomVariable(Poison(3.2))
Calculating measures of a
distribution - 1
●
●
●
●
●
●
Calculating measures of a
distribution - 2
After defining a random variable X in Maple, you
can calculate the following measures:
μ
σ2
σ
α3
α4
:= Mean(X)
:= Variance(X)
:= StandardDeviation(X)
:= Skewness(X)
:= Kurtosis(X)
●
●
●
●
●
●
Calculating probabilities - 1
ProbabilityFunction(X,a) for the pmf, i.e., f(a)=P(X=a)
–
CDF(X,a) for the CDF, i.e., F(a) = P(X≤a)
:= evalf(Mean(X))
:= evalf(Variance(X))
:= evalf(StandardDeviation(X))
:= evalf(Skewness(X))
:= evalf(Kurtosis(X))
To calculate more complex probabilities use
function CDF as follows:
–
–
To calculate probabilities use the following basic
functions:
–
μ
σ2
σ
α3
α4
Calculating probabilities - 2
●
●
To obtain floating-point (decimal) results for the
measures of a distribution you may use:
–
–
–
–
–
P(X < x) = F(x-1) => use CDF(X,x-1)
P(X > x) = 1-F(x) => use 1-CDF(X,x-1)
P(X≥ x) =1-F(x-1) => use 1-CDF(X,x-1)
P(a < X < b) = F(b-1)-F(a) =>
use CDF(X,b-1)-CDF(x,a)
P(a ≤ X< b) = F(b-1)-F(a-1) =>
use CDF(X,b-1)-CDF(x,a-1)
P(a < X ≤ b) = F(b)-F(a)=>
use CDF(X,b)-CDF(x,a)
P(a ≤ X ≤ b) = F(b)-F(a-1)=>
use CDF(X,b)-CDF(x,a-1)
●
The Bernoulli distribution
●
●
●
Measures of the Bernoulli distribution
Random variable X can take only the values x = 0
and x = 1
Probability mass function:
with 0 < p < 1
Possible association of the values of x:
Variable X
Binary logical
Voltage level
Sucess/failure
X=0 equivalent
No
Low voltage
Failure
2
●
 = p⋅1− p
●
=  p 1− p
●
X=1 equivalent
Yes
High voltage
Success
= p
●
●
 3=
4 =
1−2p
p
 1− p
1−3p3p
p 1− p
2
The binomial distribution: X~B(n,p)
●
●
●
Consider n repetitions of a Bernoulli process with
parameter p
Let X = number of “successes” in n repetitions
Probability mass function

x
n− x
f  x = n p 1− p , for x =0,1, ... , n
x
●

n!
n =
x ! n−x  !
x
Binomial coefficient:
Approximating the binomial distribution
with the normal distribution, X~N(μ,σ)
●
Measures of the binomial distribution
●
=n p
●
 = n p 1− p 
●
=  n p 1− p 
●
The Poisson distribution
●
●
 =
●
= 
●
4 =
1
e− x
, for x=0,1, ...
x!
Parameter λ represents the average number of
occurrence per unit time, length, etc.
Poisson distribution with scaling
●
2
 3=
Used to define discrete random variable X =
number of occurrences of a certain phenomena per
unit time, unit length, etc.
Probability mass function
f  x =
Main reason for the approximation: to avoid
calculating large factorial values – No longer an
impediment with modern calculators and software
=
●
●
Use X := RandomVariable(Normal(μ,σ)) to define
a normal random variable (continuous)
●
1−2p
 np 1− p
4 =long expression , see worksheet
●
Applies for relatively large values of n and
relatively small values of p so that
Measures of the Poisson distribution
●
3=
●
np ≥ 5 or n(1-p) ≥ 5
●
2
●
●

●
31

●
Let X = number of occurrences of a phenomenon,
say, per unit time
Let σ = average number of occurrences per unit
time
Let T = period of interest for the analysis
Use λ = σ T as the parameter in the Poisson
distribution
See example of scaling in worksheet
Approximating the binomial distribution
with the Poisson distribution
●
●
●
●
Applies for np ≥ 5 or n(1-p) ≥ 5
●
●
Read details in worksheet
●
Consider several repetitions of a Bernoulli process
with parameter p
●
●
●
1− p
p
●
 =
●
Let Xi = maximum value of an event in period i,
independent random variables
Let q = P(Xi<x) = probability of no-exceedence of
value x in period i
Let p = P(Xi>x) = probability of exceedence of
value x in period i, thus q+p = 1, q = 1-p
Let T = number of periods past before the value of
x is exceeded
P(T=t) = P(X1<x)P(X2<x)...P(Xt-1<x)P(Xt>x)
= qt-1p = (1-p)t-1 p = f(t), a pmf
T~geometric(p)
Read details in worksheet
=
Probability mass function
Period of return - 1
Main reason for the approximation: to avoid
calculating exponential functions in the Poisson
distribution – No longer an impediment with
modern calculators and software
●
●
p , for x=1, 2, ...
Similar to the approximation of the binomial
distribution with the normal distribution
Measures of the geometric distribution
Let X = number of repetitions required for the first
success
x−1
●
●
Main reason for the approximation: to avoid
calculating large factorial values – No longer an
impediment with modern calculators and software
f  x =1− p
●
●
Use X := RandomVariable(Poisson(λ)) to define a
Poisson random variable (continuous)
The geometric distribution
●
Approximating the Poisson distribution with
the normal distribution
2
= 
 3=
●
1− p
2
p
 4=
1− p
p
2− p
 1− p
p 2−9p9
1− p
Period of return - 2
●
Expected value of the geometric distribution with
parameter p
1
1
E T = =
p PX x
●
●
Example, let X = magnitude of an annual flood,
with p = P(X>x) = 0.010 for x = 500 cfs, then
E(T) = 1/0.010 = 100 year
Thus, the period of return of a 500-cfs flood is 100
years, or the 100-year flood is 500 cfs
The hypergeometric
distribution
● Consider figure
–
–
–
–
The discrete uniform distribution
Finite population size N
with a objects of a type
Draw a sample of size n
Let X = number of objects
of the type in sample
Probability mass function:
a
N −a

x  n− x 
f  x =
 Nn 
Let X = random variable taking the values x = a,
a+1, ..., b, each value with equal probability
The probability mass function is
●
●
●
Mean
●
Variance
2
=
=
na
N
n a  N −a N −n
2
N  N −1
Inverse cumulative distribution function
f  x =
●
Mean:
●
Variance:
●
●
●
Given a probability p = F(x), the value of x is
defined as
x = F -1(p)
●
F -1 is the inverse cumulative distribution function
(ICDF) of X
●
For a discrete random variable X the p quantile is
defined by
Q(p) = inf{x|F(x)≥p}
i.e., the closest inferior value of x such that F(x) is
larger or equal to p. This is calculated using
Maple's function Quantile(X,p)
●
The corresponding cumulative distribution
function (CDF) is
F(x) = 1 - e – λ x
For p = F(x), the ICDF is given by
F -1(p) = -ln(1-p)/λ
Fitting a distribution to a sample
●
Xs = {x1,x2,...,xns}, numerical sample of size ns.
●
Mean of the sample
1
x mean= ∑ x i
ns i=1
●
Variance of the sample
1
s=
 x −x mean2
∑
ns−1 i=1
ns
If X takes only integer values, the ICDF for X is
calculated using Maple's function Quantile as
●
F -1(p) = Quantile(X,p) - 1
σ2 = (a-b)(a-b-2)/12
The probability density function (pdf) for this case
is given by
f(x) = λ e – λ x, x ≥ 0
The CDF of a random variable X is defined as F(x)
= P(X≤ x).
ICDF and Maple function Quantile
μ = (a+b)/2
Example - ICDF for the exponential
distribution (continuous variable case)
●
●
1
, for x =a , a1,... , b
b−a1
2
ns
Select a distribution, make μ = xmean and σ 2 = s2,
and solve for the parameters of the distribution
Random numbers
●
Numbers generated by random processes, e.g.,
numbers out of a roulette, or lottery
Statistical simulation or MonteCarlo simulation
●
●
Computers use deterministic algorithms that
produce pseudo-random numbers
●
●
●
Use Maple function Sample(X,ns), within package
Statistics, to produce a sample (vector) of size ns
for the random variable X, e.g., Xs:= Sample(X,ns)
To convert from a vector to a list, use:
convert(Xs,list)
●
●
Generating synthetic data out of a given
distribution to use as input for a model
Example 1 - generating precipitation data for a
hydrological model
Example 2 – generating hydraulic conductivity
data for an aquifer in groundwater simulation
Example 3 – generating traffic data for a highway
operation simulation
Download