Chapter 5, Probability Distributions 5.1 Introduction - In this chapter, we will discuss various probability distributions including discrete probability distributions and continuous probability distributions. - Discrete probability distributions is used when the sampling space is discrete but not countable. Following is a list of discrete probability distributions: discrete uniform binomial and multinomial hypergeometric negative binomial geometric Poisson - Continuous probability distribution is used when the sample space is continuous. Following is a list of continuous probability distributions: Uniform Normal (or Guassian) Gamma Beta t distribution F distribution 2 distribution 5.2 Discrete uniform distribution - the definition: if a r. v., X, assumes the values x1, x2, ..., xk with equal probabilities, then X conforms discrete uniform distribution and its probability function is given below: f (x,k ) - 1 , x x1 , x2 ,...,x k k the mean and variance: 1 k xi k i 1 1 k (x i )2 k i1 2 5.3 Binomial and multinomial distributions - First, let us introduce the Bernoulli process. If: the outcomes of process is either success (X = 1) or fail (X = 0) the probability of success is P(X = 1) = p and the probability of fail is P(X = 0) = 1-p = q Then, the process is a Bernoulli process. - The probability distribution of the Bernoulli process: p(x) = px(1 - p)1-x, x = 0, 1 and 0 < p < 1 - The mean and the variance: E(X) = p V(X) = p(1 - p) - An example: what is the prob. of picking a male student? X = 1: male student with probability p = (8/12) = 2/3 X = 0: female student with probability 1-p = 1/3 Thus, the probability distribution is: P(x) = (0.25)x(0.75)1-x, x = 0 and 1 In addition, the mean: p = 2/3 and the variance V = (2/3)(1/3) = 2/9 - Binomial Distribution: the binomial distribution is defined based on the Bernoulli process. It is made up of n independent Bernoulli processes. Suppose that X1, X2, ..., Xn are independent Bernoulli random variables, then Y = Xi will conform Binomial distribution. (note that Y is the number of successes among the n trails) - The probability distribution of binomial distribution is: n P(Y y ) p y (1 p) n y , y 0,1, ..., n y - The student example: pick three students from the 12 students (Note we must take samples with replacement in order to ensure the same probability and independence). none is male student from the 3: the possibility: FFF 3 3 the probability: (1-p) = (0.037) 0 one is male student from the 3: the possibility: MFF, FMF, FFM 3 2 the probability: 3p(1-p) = (0.222) 1 two are male students from the 3: the possibility: MMF, MFM, FMM 3 2 the probability: 3p (1-p) = (0.445) 2 three are male students from the 3: the possibility: MMM 3 3 the probability: p = (0.296) 3 In general, the formula is: 3 P(Y y ) p y (1 p) 3 y , y 0, 1, 2, 3 y We can derive the general formula in a same manner. - Mean and variance of the binomial distribution: E(Y) = E(Xi) = p = np V(Y) = V(Xi) = p(1 - p) = np(1 - p) - the example: find the mean and variance of picking male students and then use Chybeshev's theorem to interpret the interval ± 2. = (3)(2/3) = 2 = (3)(2/3)(1/3) = 2/3, = 0.817 at k = 2, + 2 = 2 + (2)(0.816) = 3 - 2 = 2 - (2)(0.816) = 1 (1 - 1/k2) = 3/4. Therefore, there should be at least a probability of 3/4 that the number of male students picked are between 1 to 3. Indeed, the probability is actually p(1)+p(2)+p(3) = 0.973. - Using the Binomial distribution table: a function of n and p. - Multinomial distribution: this is an extension of binomial distribution: let x1, x2, ..., xk be independent r. v. with the probability p1, p2, ..., pk, where, k k i 1 i 1 xi n, and pi 1 then, they conform multinomial distribution with the probability distribution: n x1 x 2 p1 p2 ... pkxk f ( x1 , x2 ,...xk ; p1 , p2 ,... pk ) x1 , x2 ,...xk 5.4 Hypergeometric Distribution - The example: what is the probability of pick three male students in a roll? Note that at this time, samples are not independent, or sampling without replacement. As a result we need to use hypergeometric distribution. Following shows how the distribution is formed: no male student from the 3 students 12 8 4 8 4 0 3 12 3 total , male , female 3 0 3 probability = one male students from the 12 students 12 8 4 total , male , female 3 1 2 probability = 8 4 1 2 12 3 two male students from the 12 students 12 8 4 8 4 2 1 12 3 total , male , female 3 2 1 probability = three male students from the 12 students 12 8 4 8 4 3 0 12 3 total , male , female 3 3 0 probability = In general, the probability distribution is as follows: 8 4 y3 y P(Y y) , y 0, 1, 2, 3 12 3 - the general formula of the hypergeometry distribution: k N k yn y P(Y y) , y 0, 1, 2, ..., n N n - the mean and the variance of the hypergeometry distribution: nk N N n nk k 2 1 N 1 N N as a special case, let N be infinite, then (k / N) = p, and (N-n) / (N-1) = 1. Hence: = np 2 = np(1 - p) That is, the hypergeometric distribution becomes the binomial distribution - We can also define the multivariate hypergeometric distribution 5.5 Negative Binomial and Geometric Distributions - An example: picking three students, what is the probability that the third student is the second male? a possibility is FMM and its probability is (1-p)p2 the other possibility is MFM and its probability is (1-p)p2 3 1 note that there are combinations, and hence, the probability is: 2 1 3 1 f (X 3,k 2) 1 p p2 2 1 - The general formula for the negative binomial distribution is as follows: x 1 k f (X x) p (1 p) x k , x = k, k+1, k+2, ... k 1 where, x is the number of trails and k is the kth success. - the mean of variance of the negative binomial distribution: E(X) = k(1-p)/p V(X) = k(1-p)/p2 - another example: picking until get a male student: the first pick: p the second pick: (1-p)p the third pick: (1-p)2p - the general formula is: f(X = x) = (1 - p)x-1p, x = 1, 2, 3, ... This is the geometric distribution. - the mean of variance of the negative binomial distribution and geometric distributions: E(X) = 1/p V(X) = (1-p)/p2 5.6 Poisson Distribution - Poisson process is a random process representing a discrete event takes place over continuous intervals of time or region. Examples of Poisson processes include: the arrival of telephone calls at a switchboard, the passing cars of an electric checking device. Note that all these examples involve a discrete random event. At any given small period of time (or region), the probability that the event occurs is small; however, over a long time (or large region), the number of occurrence is large. - Poisson distribution plays an extremely important role in science and engineering, since it represents an appropriate probabilistic model for a large number of observational phenomena. - The Poisson distribution can be described by the following formula: p(x, t) e t ( t) x , x = 0, 1, 2, ... x! where, is the average number of outcomes per unit time or region. Hence, t represents the number of outcomes. Proof: refer to the textbook. - The Poisson process can be considered as an approximation to the Binomial Distribution when n is large and p is small. - From a physical point of view, given a time interval of length T, which is divided interval into n equal sub-intervals of length t (t 0), (note that T = nt), and assume: The probability of a success in any sub-interval t is given by t. The probability of more than one success in any sub-interval t is negligible. The probability of a success in any sub-interval does not depend on what happened prior to that time. Then, we have the Poisson distribution. - Mean and Variance of Poisson distribution - An example: in a large company, industrial accidents occur at the mean of three per week (t = 3) (note that accidents occurs independently). the probability distribution: y p(y) = (3) exp(-3) / y!, y = 0, 1, 2, ... the probability can be determined based on simple calculation or by means of checking the Poisson distribution table. the probability of less than and equal to four accidents in a week: p(0) + p(1) + p(2) + p(3) + p(4) = 0.815 the probability of equal and more than four: P(Y 4) = 1 - P(Y 3) = 0.353 the probability of equal to four P(Y = 4) = P(Y 4) - P(Y 3) = 0.168 note that this is the same as: p(4) = 0.168 5.7 Uniform Distribution - The uniform distribution is a continuous probability distribution the assumption: the random event is equally likely in an interval an example: receiving an express mail between 1 ~ 5 pm - The probability density function (pdf) 1 f ( x) b a 0 - elsewhere By integration, we obtain the probability function (pf) 0 x a F ( x) b a 1 - a xb xa a xb bx A comparison between the discrete distributions and continuous distribution the discrete r. v., we have probability function: P(X = x) = p(x) for continuous r. v.: F(X = x) = 0 x F(x) = f(x) dx - f(x) = - F(x) dx An example: receiving an express mail equally likely between 1 to 5 pm. f(x) = 1/4, 1x5 0, elsewhere hence, the probability of receiving an express mail between 2 to 5 pm is P(2 X 5) = (5 - 1)/(5 - 1) - (2 - 1)/(5 - 1) = 3/4. - The mean and the variance: E(x) = (a+b)/2 2 V(x) = (b-a) /12 5.8 Normal Distribution - In the natural world there are more cases where possibilities are not equally likely. Instead there is a most likely value and then the likelihood decreases symmetrically. This leads to the Normal distribution. - Normal distribution is by far the most widely used probability distribution. Why Normal distribution is so popular? the large number theorem a linear combination of Normal is still Normal - The probability density function: f(x) = 1 2 - (x - )2 /2 2 e note that probability function does not have analytical form, hence, we rely on numerical calculation (Table A.3) - The mean, variance and standard deviation of a normal distributions: E(X) = 2 V(X) = These two parameters uniquely determine the normal distribution. Hence, a normal distribution is often denoted as N(, ) - Illustration of the normal distribution: the bell shape the mean - the standard deviation: ± (68% area), ±2 (95.4% area), and ±3 (99.7% area). In particular, with E(X) = 2 V(X) = we have the standard normal distribution N(0, 1) - Calculate the probability through the standard normal distribution: translate to a normal distribution to a standard normal distribution by: X- Z= use the normal distribution table (Table A.3) - An example: given N(16, 1), P(X > 17) = ? Z = (X - 16)/1 P[Z > (17 - 16)/1] = P(Z > 1) = 1 - P(Z < 1) = 1 - 8413 (form Table A.3) = 0.1587 - Questions: given and , how to calculate P(c1 X c2)? given p, and , how to calculate x so that P(X > x) = p - Given a set of data, it is often necessary to checking whether the data set conforms normal distribution. - The student example - the number of hours of study of the 12 students: sorting the data: 10, 12, 12, 14, 14, 14, 15, 15, 15, 20, 20, 25 note that there are just 6 different values. So, the 100 6 = 16.7 finding the percentile of the data: 16, 32, 32, 48, 48, 48, 64, 64, 64, 80, 80, 96 finding the z-values of the percentile: -1., -.47, -.47, -.05, -.05, -.05, .36, .36, .36, .85, .85, 1.75 plotting: • 25 • 20 15• • -1.5 - •-1 -0.5 10 0.5 1 1.5 2 Because the horizontal axis is from a normal distribution, the linear relationship indicates that the distribution of the data can be approximated by a normal distribution. If a data set conforms normal distribution, then the related probability calculated can be easily done. Following the 12 students example: = 15.5 = 16 Question: what is the prob. of picking a student who studies at least 15 hours per week? Answer: we first calculate the z value; z = (15 - 15.5) / 4 = -0.125 hence, the probability is: P(Z > -0.125) = 1 - P(Z < -0.125) = 1 - 0.45 = 0.55 - As another example, assuming that an exam is coming, everybody is putting an extra 3 hours for study per week, what is the probability of picking a student who studies at least 20 hours per week? We first calculate the z value; z = (20 - 18.5) / 4 = 0.375 hence, P(X > 20) = P(Z > 0.375) = 1 - P(Z < 0.375) = 1 - 0.64 = 0.36. - As an exercise, you may want to try to find that, given a probability of 95%, what is the range of the hours of study per week for a picked student. - Normal approximation to binomial. Assuming p is small and n is large, then Z X np np(1 p) is approximately normally distributed. This can be demonstrated by the example. In the students example, the probability of picking a student who studies more than 15 hours per week is p = 3/12 = 1/4. Consider the case of sampling with replacement, picking 3 students who all study more than 15 hours per week is: b(X = 3, n = 12, p = 1/4) = 0.212 Use normal distribution to approximate: = np = (12)(1/4) = 3 2 = np(1 - p) = (12)(1/4)(3/4) = 9/4 = 2.25 ( = 1.5) hence, P(2.5 < X < 3.5) = P[(2.5 - 3)/1.5 < Z < (3.5 - 3)/1.5] = P(-0.167 < Z < 0.167) = 0.56 - 0.395 = 0.165 It is seen that the results are rather similar. The approximation error is caused by small n (n = 12). - The normal approximation of binomial distribution is very useful when n is large because binomial distribution will then require tedious calculation. 5.9 Exponential distribution, Gamma distribution and Chi-Square (2) distribution - There are cases, for example the failure rate, in which the possibility decreases exponentially. This leads to the exponential distribution. - the probability density function of the exponential distributions: 1 x exp x 0, 0 f ( x) 0 elsewhere - the probability function F(x) = 1 - exp(-x/), - x > 0, > 0 To calculate mean and variance, we need the Gamma () function: () = x -1 -x e dx 0 using integration by part: (uv)' = u'v + uv' uv or u' v uv' uv' uv u' v let u = x-1, dv = e-xdx, it follows that: ( ) ex x 1 e x ( 1)x 2 dx ( 1)( 1) 0 0 In particular: (+1) = F() (n) = (n-1)! (1/2) = In general: 0 (x) 1 e x dx ( ) for the geometry distribution, since = 1, = : E(X) = 2 V(X) = - The exponential distribution is correlated to Poisson distribution: given a Poisson distribution with the mean t, the probability of first time occurrence is exponential. - Another common case is that the possibility is low when close to zero - this leads to the Gamma distribution. The probability density function of Gamma distribution: f ( x) 1 x 1 e x , x > 0, > 0. - The mean and variance: E(X) = 2 V(X) = - Note that exponential distribution is a special case of Gamma distribution with = 1. - Another special case of the gamma distribution is the 2 distribution. Let = /2 and = 2, it results in the 2 distribution: f (x) 1 2 2 ( 2 ) x 2 1 e x2 ,x>0 its mean and variance are as follows: = 2 = 2 - Illustration. Gamma or 2 Exponential 5.10 Weibull distribution - The assumption: similar to Gamma - The probability density function: -1 -x / x e , = 0, f(x) = x>0 otherwise - The probability function: F(x) = 1 - exp(-x /), x > 0 - The mean and variance 1/ E(X) = (1 + 1 ) 2/ 1 2 V(X) = {(1 + 2 ) - [(1 + )] } - Application in reliability, defining: f(t) - the pdf of failure F(t) - the pf of failure R(t) = 1 - F(t) - the probability of no failure (reliability function) r(t) = f(t) / R(t) - the failure rate function if: r (t ) f (t ) f (t ) 1 R(t ) 1 F (t ) then f(t) will be exponential. - Proof: since dF(t)/dt = f(t) • F'(t) = 1 - F(t) • F'(t) + F(t) = 1 solving the above gives: F(t) = 1 - exp(-t/), t 0 or f(t) = 1/ exp(-t/), t0 5.11 Summary - Discrete distributions discrete uniform: equally likely binomial and multinomial: number of success in n independent Bernoulli experiments hypergeometric: sampling is dependent (finite sampling space) negative binomial: kth success in n trials geometric: trail until success Poisson: discrete event in continuous intervals. - Continuous distributions uniform: equally likely Normal: has a most likely value and decreasing symmetrically exponential: gradually decreasing Gamma: small when close to zero (generalized exponential) Beta: contained in a finite interval Weibull: generalized Gamma