Discrete Probability Distributions Reference A CEE3030 lecture prepared by The subjects presented are taken from the Maple worksheed entitled Gilberto E. Urroz DiscreteProbabilityDistributions February 2006 available for download in the class schedule Quick review of concepts for discrete random variables - 1 ● Let X be a discrete random variable, then – – Quick review of concepts for discrete random variables - 2 ● f(x) = P(X=x) is the probability mass function (pmf) F(x) = P(X≤x) = f u = cumulative ∑ u≤ x distribution function (CDF) Let X be a discrete random variable, then – ● f(x) = P(X=x) is the probability mass function (pmf) Calculation of measures n – Mean, Calculation of probabilities ● – – – – – – P(X < x) = F(x-1) -- P(X≤ x) = F(x) P(X > x) = 1-F(x) -- P(X≥ x) =1-F(x-1) P(a < X < b) = F(b-1)-F(a) P(a ≤ X< b) = F(b-1)-F(a-1) P(a < X ≤ b) = F(b)-F(a) P(a ≤ X ≤ b) = F(b)-F(a-1) Discrete distributions in Maple ● ● Use the command: ?Statistics,Distributions for a list of available distributions Discrete distributions of interest are: Bernoulli Binomial ●DiscreteUniform ●EmpiricalDistribution ●Geometric ●Hypergeometric ●NegativeBinomial ●Poisson ●ProbabilityTable ● ● Bernoulli distribution binomial distribution discrete uniform distribution empirical distribution geometric distribution hypergeometric distribution negative binomial (Pascal) dist. Poisson distribution probability table – Variance, – Skewness – Kurtosis =∑ x i⋅f x i i=1 n =∑ x i −2⋅f x i 2 i =1 n 1 3 x i − ⋅ f x i 3∑ i =1 n 1 4= 4 ∑ x i −4⋅ f x i i =1 3= Using Maple Statistics package to define a discrete random variable ● To load the Statistics package use: with(Statistics) ● Use ?<distribution name> for help – ● e.g., ?Geometric Define a random variable with distribution name and appropriate parameters with function RandomVariable – – e.g., X := RandomVariable(Binomial(n,p)) e.g., X := RandomVariable(Poison(3.2)) Calculating measures of a distribution - 1 ● ● ● ● ● ● Calculating measures of a distribution - 2 After defining a random variable X in Maple, you can calculate the following measures: μ σ2 σ α3 α4 := Mean(X) := Variance(X) := StandardDeviation(X) := Skewness(X) := Kurtosis(X) ● ● ● ● ● ● Calculating probabilities - 1 ProbabilityFunction(X,a) for the pmf, i.e., f(a)=P(X=a) – CDF(X,a) for the CDF, i.e., F(a) = P(X≤a) := evalf(Mean(X)) := evalf(Variance(X)) := evalf(StandardDeviation(X)) := evalf(Skewness(X)) := evalf(Kurtosis(X)) To calculate more complex probabilities use function CDF as follows: – – To calculate probabilities use the following basic functions: – μ σ2 σ α3 α4 Calculating probabilities - 2 ● ● To obtain floating-point (decimal) results for the measures of a distribution you may use: – – – – – P(X < x) = F(x-1) => use CDF(X,x-1) P(X > x) = 1-F(x) => use 1-CDF(X,x-1) P(X≥ x) =1-F(x-1) => use 1-CDF(X,x-1) P(a < X < b) = F(b-1)-F(a) => use CDF(X,b-1)-CDF(x,a) P(a ≤ X< b) = F(b-1)-F(a-1) => use CDF(X,b-1)-CDF(x,a-1) P(a < X ≤ b) = F(b)-F(a)=> use CDF(X,b)-CDF(x,a) P(a ≤ X ≤ b) = F(b)-F(a-1)=> use CDF(X,b)-CDF(x,a-1) ● The Bernoulli distribution ● ● ● Measures of the Bernoulli distribution Random variable X can take only the values x = 0 and x = 1 Probability mass function: with 0 < p < 1 Possible association of the values of x: Variable X Binary logical Voltage level Sucess/failure X=0 equivalent No Low voltage Failure 2 ● = p⋅1− p ● = p 1− p ● X=1 equivalent Yes High voltage Success = p ● ● 3= 4 = 1−2p p 1− p 1−3p3p p 1− p 2 The binomial distribution: X~B(n,p) ● ● ● Consider n repetitions of a Bernoulli process with parameter p Let X = number of “successes” in n repetitions Probability mass function x n− x f x = n p 1− p , for x =0,1, ... , n x ● n! n = x ! n−x ! x Binomial coefficient: Approximating the binomial distribution with the normal distribution, X~N(μ,σ) ● Measures of the binomial distribution ● =n p ● = n p 1− p ● = n p 1− p ● The Poisson distribution ● ● = ● = ● 4 = 1 e− x , for x=0,1, ... x! Parameter λ represents the average number of occurrence per unit time, length, etc. Poisson distribution with scaling ● 2 3= Used to define discrete random variable X = number of occurrences of a certain phenomena per unit time, unit length, etc. Probability mass function f x = Main reason for the approximation: to avoid calculating large factorial values – No longer an impediment with modern calculators and software = ● ● Use X := RandomVariable(Normal(μ,σ)) to define a normal random variable (continuous) ● 1−2p np 1− p 4 =long expression , see worksheet ● Applies for relatively large values of n and relatively small values of p so that Measures of the Poisson distribution ● 3= ● np ≥ 5 or n(1-p) ≥ 5 ● 2 ● ● ● 31 ● Let X = number of occurrences of a phenomenon, say, per unit time Let σ = average number of occurrences per unit time Let T = period of interest for the analysis Use λ = σ T as the parameter in the Poisson distribution See example of scaling in worksheet Approximating the binomial distribution with the Poisson distribution ● ● ● ● Applies for np ≥ 5 or n(1-p) ≥ 5 ● ● Read details in worksheet ● Consider several repetitions of a Bernoulli process with parameter p ● ● ● 1− p p ● = ● Let Xi = maximum value of an event in period i, independent random variables Let q = P(Xi<x) = probability of no-exceedence of value x in period i Let p = P(Xi>x) = probability of exceedence of value x in period i, thus q+p = 1, q = 1-p Let T = number of periods past before the value of x is exceeded P(T=t) = P(X1<x)P(X2<x)...P(Xt-1<x)P(Xt>x) = qt-1p = (1-p)t-1 p = f(t), a pmf T~geometric(p) Read details in worksheet = Probability mass function Period of return - 1 Main reason for the approximation: to avoid calculating exponential functions in the Poisson distribution – No longer an impediment with modern calculators and software ● ● p , for x=1, 2, ... Similar to the approximation of the binomial distribution with the normal distribution Measures of the geometric distribution Let X = number of repetitions required for the first success x−1 ● ● Main reason for the approximation: to avoid calculating large factorial values – No longer an impediment with modern calculators and software f x =1− p ● ● Use X := RandomVariable(Poisson(λ)) to define a Poisson random variable (continuous) The geometric distribution ● Approximating the Poisson distribution with the normal distribution 2 = 3= ● 1− p 2 p 4= 1− p p 2− p 1− p p 2−9p9 1− p Period of return - 2 ● Expected value of the geometric distribution with parameter p 1 1 E T = = p PX x ● ● Example, let X = magnitude of an annual flood, with p = P(X>x) = 0.010 for x = 500 cfs, then E(T) = 1/0.010 = 100 year Thus, the period of return of a 500-cfs flood is 100 years, or the 100-year flood is 500 cfs The hypergeometric distribution ● Consider figure – – – – The discrete uniform distribution Finite population size N with a objects of a type Draw a sample of size n Let X = number of objects of the type in sample Probability mass function: a N −a x n− x f x = Nn Let X = random variable taking the values x = a, a+1, ..., b, each value with equal probability The probability mass function is ● ● ● Mean ● Variance 2 = = na N n a N −a N −n 2 N N −1 Inverse cumulative distribution function f x = ● Mean: ● Variance: ● ● ● Given a probability p = F(x), the value of x is defined as x = F -1(p) ● F -1 is the inverse cumulative distribution function (ICDF) of X ● For a discrete random variable X the p quantile is defined by Q(p) = inf{x|F(x)≥p} i.e., the closest inferior value of x such that F(x) is larger or equal to p. This is calculated using Maple's function Quantile(X,p) ● The corresponding cumulative distribution function (CDF) is F(x) = 1 - e – λ x For p = F(x), the ICDF is given by F -1(p) = -ln(1-p)/λ Fitting a distribution to a sample ● Xs = {x1,x2,...,xns}, numerical sample of size ns. ● Mean of the sample 1 x mean= ∑ x i ns i=1 ● Variance of the sample 1 s= x −x mean2 ∑ ns−1 i=1 ns If X takes only integer values, the ICDF for X is calculated using Maple's function Quantile as ● F -1(p) = Quantile(X,p) - 1 σ2 = (a-b)(a-b-2)/12 The probability density function (pdf) for this case is given by f(x) = λ e – λ x, x ≥ 0 The CDF of a random variable X is defined as F(x) = P(X≤ x). ICDF and Maple function Quantile μ = (a+b)/2 Example - ICDF for the exponential distribution (continuous variable case) ● ● 1 , for x =a , a1,... , b b−a1 2 ns Select a distribution, make μ = xmean and σ 2 = s2, and solve for the parameters of the distribution Random numbers ● Numbers generated by random processes, e.g., numbers out of a roulette, or lottery Statistical simulation or MonteCarlo simulation ● ● Computers use deterministic algorithms that produce pseudo-random numbers ● ● ● Use Maple function Sample(X,ns), within package Statistics, to produce a sample (vector) of size ns for the random variable X, e.g., Xs:= Sample(X,ns) To convert from a vector to a list, use: convert(Xs,list) ● ● Generating synthetic data out of a given distribution to use as input for a model Example 1 - generating precipitation data for a hydrological model Example 2 – generating hydraulic conductivity data for an aquifer in groundwater simulation Example 3 – generating traffic data for a highway operation simulation