Topic 4: Discrete Random Variables and Probability Distributions CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering and the Sciences by Jay L. Devore, Duxbury 1995 (4th edition) Definition For a given sample space S of some experiment, A random variable is any rule that associates a number with each outcome in S Random variables may take on finite or infinite values Examples: {0,1}, the number of sequential tosses of a coin in which the outcome “head” is observed A set is discrete either if it consists of a finite number of elements or if its elements may be listed in sequence so that there is a first element, a second element, a third element, and so on, in the list A random variable is said to be discrete if its set of possible values is a discrete set. p( x) P( X x) P(all s S : X ( s) x) Definition The probability distribution or probability mass function (pmf) of a discrete random variable is defined for every number x by p( x) P( X x) P(all s S : X ( s) x) The cumulative distribution function (cdf) F(x) of a discrete rv X with pmf p(X) is defined for every number x by F ( x ) P( X x ) P( y ) y: y x Examples 0.3 0.12 p(x)= 0.25 0.33 0 x=0 x=1 x=2 x=3 otherwise 0.5 x = 0 p(x) = 0.5 x = 1 0 otherwise 1/n x = 1,2,...n p(x) = 0 otherwise Definition Let X be a discrete rv with set of possible values D and pmf p(x). The expected value or mean value of X, denoted E(x) or mx is given by: E(X) = mx xp(x) xD If the rv X has a set of possible values D and pmf p(x), then the expected value of any function h(X), denoted by E[h(x)] or mh(x) is computed by: E(h(X)) = h(x)p(x) D Note: according to Ross(1988) p. 255 this is known as the law of the unconscious statistician Example Let X be a discrete rv with the following pmf and corresponding cdf 0.10 x = 0 0 0.25 x = 1 0.10 0.35 x = 2 0.35 p (x )= F (x)= 0.20 x = 3 0.70 0.10 x = 4 0.90 0 otherwise 1.00 x<0 x 0 x 1 x 2 x 3 x 4 E ( x) xp(x)= D 0(0.1)+1(0.25)+2(0.35)+3(0.20)+4(0.10)=1.95 Now let h(x) = x2+2 E (h( x)) h(x)p(x)= D 2(0.1)+3(0.25)+6(0.35)+11(0.20)+18(0.10)=7.05 Class exercise Let X be a discrete rv with the following pmf Calculate the cdf of X 0.15 x = 0 0.30 x = 1 p (x)= 0.55 x = 2 0 otherwise F (x )= Calculate the E(X) E ( x) xp(x)= D Now let h(x) = 3x3-100 -- Calculate h(x) E (h( x)) h(x)p(x)= D Definition For linear functions of x we have simple rules E(aX+b) = aE(X)+b E(aX) = aE(X) E(X+b) = E(X)+b Variance of a random variable If X is a random variable with mean m, then the variance of X denoted by Var(X) is defined by Var(X) = E (X-m )2 Recall that we previously defined the variance of a population as the average of the squared deviations from the mean. The expected value is nothing other than the average or mean so this form corresponds exactly to the one we used earlier. Variance of a random variable Its often convenient to use the a different form of the variance which, applying the rules of expected value which we just learned and remembering the E[X] = m, we derive in the following way. 2 Var(X) = E (X-m ) = E X 2 2m X-m 2 = E X 2 E 2m X +E m 2 = E X 2 2 m E X +m 2 = E X 2 2 m 2 +m 2 = E X2 m 2 = E X E X 2 2 The Bernoulli distribution Any random variable whose only two possible values are 0 and 1 is called a Bernoulli random variable. Example: Suppose a set of buildings are examined in a western city for compliance with new stricter earthquake engineering specifications. After 25% of the cities buildings are examined at random and 12% are found to be out of code while 88% are found to conform to the new specifications it is supposed that buildings in the region have a 12% likely hood of being out of code. Let X = 1 if the next randomly selected building is within code and X = 0 otherwise. The distribution of buildings in and out of code is a Bernoulli random variable with parameters p= 0.88 and 0.12. The Bernoulli distribution We write this as follows 1 if the building is "to code" X 0 otherwise p(0) 0.12 p(1) 0.88 p( x) p( X x) 0 for x 0 or 1 0.12 if x = 0 p(x)= 0.88 if x =1 0 otherwise The Bernoulli distribution In general form, a is the parameter for the Bernoulli distribution but we usually refer to this parameter as p, the probability of success 1-a if x = 0 p (x;a )= a if x =1 0 otherwise The mean of the Bernoulli distribution with parameter p is E(X) = xp(x)= (1) p (0)(1 p) p The variance of the Bernoulli distribution with parameter p is Var = E(X2 )-E(X)2 = (12 ) p (02 )(1 p) p2 p(1 p) The Binomial distribution Now consider a random variable that is made up of successive (independent) Bernoulli trials and define the random variable X as the number of successes among n trials. The Binomial distribution has the following probability mass function n x p( x; n, p) p (1 p)n x for n = 0,1,2,...n x Remembering what we learned about combinations this makes intuitive sense. The binomial coefficient represents the number of ways to distribution the x successes among n trials. px represents the probability that we have x successes in the n trials, while (1-p)(n-x) represents the probability that we have n-x failures in the n trials. The Binomial distribution Computing the mean and the variance of the binomial distribution is straightforward. First remember that the binomial random variable is the sum of the number of successes in n consecutive Bernoulli trials. Therefore X=X1 +X 2 +...+X n E(X)=E X1 +E X 2 +...+E X n E(X)=p+p+...p = np Var(X) = (1) 2 Var(X1 )+(1) 2 Var(X 2 )...+...(1) 2 Var(X n ) Var(X) = p(1-p)+p(1-p)...+...p(1-p) Var(X) = np(1-p) The Binomial distribution p(X=x) b(x;5;0.1) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 4 5 X b(x;5;0.3) 0.4 p(X=x) 0.3 0.2 0.1 0 0 1 2 3 X 4 5 The Binomial distribution p(X=x) b(x;5;0.5) 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 5 3 4 5 X b(x;5;0.7) 0.4 p(X=x) 0.3 0.2 0.1 0 0 1 2 X The Binomial distribution p(X=x) b(x;5;0.9) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 2 3 X 4 5 Class exercise A software engineer has historically delivered completed code to clients on schedule, 40% of the time. If her performance record continues, the probability of the number of on schedule completions in the next 6 jobs can be described by the binomial distribution. Calculate the probability that exactly four jobs will be completed on schedule Calculate the probability that at most 5 jobs will be completed on schedule Calculate the probability that at least two jobs will be completed on schedule The Hypergeometric distribution The binomial distribution is made up of independent trials in which the probability of success does not change. Another distribution in which the random variable X represents the number of successes among n trials is the hypergeometric distribution. The hypergeometric distribution assumes a fixed population in which the proportion or number of successes is known. We think of the hypergeometric distribution as involving trials without replacement and the binomial distribution involving trials with replacement. The classic illustration of the differences between the hypergeometric and the binomial distribution is that of black and white balls in a urn. Assume the proportion of black balls is p. The distribution of the number of black balls selected in n trials is binomial(x;n;p) if put the balls back in the urn after selection and hypergeometric(x;n;M,N) if we set them aside after selection. (engineering examples to come) The Hypergeometric distribution The probability mass function of the hypergeometric distribution is given by the following, where M is the number of possible successes in the population, N is the total size of the population, x is number of successes in the n trials. M N M x n x p( x; n, M , N ) N n The Hypergeometric distribution Example: Suppose out of 100 bridges in a region, 30 have been recently retrofitted to be more secure during earthquakes. Ten bridges are selected randomly from the 100 for inspection. What is the probability that at least three of these bridges will be retrofitted? 30 70 30 70 30 70 0 10 1 9 2 8 p( x 3) 1 100 100 100 10 10 10 The Hypergeometric distribution The mean and variance of the hypergeometric distribution is given below: E( X ) nM N Var ( X ) N n N 1 n M N (1 M ) N Sometimes we use p to refer to M N (the proportion of the population with a particular characteristic) Then E(X) = np and Var(X) = N n N 1 np(1 p) The Hypergeometric and the Binomial distributions Note the following: Sometimes we refer to M/N, the proportion of the population with a particular characteristic as p. Then E(X) = np and Var(X) = N n N 1 np(1 p) This is very close to E[X] = np and Var(x) = pn(1-p) which are the mean and variance of the Binomial distribution. In fact we can think of N n N 1 as a correction term which accounts for the fact that in the hypergeometric distribution we sample without replacement. Question: Should the variance of the Hypergeometric distribution with proportion p be greater than or less than the variance of the Binomial with parameter p? Can you give some intuitive reason why this is so? The Hypergeometric and the Binomial distributions Example: Binomial, p = 0.40, three trials vs. Hypergeometric with M/N = 2/5 = 0.40, three trials 3 P ( X 0) (0.40) 0 (0.60) 3 0 3 P ( X 1) (0.40)1 (0.60) 2 1 3 P ( X 2) (0.40) 2 (0.60)1 2 3 P ( X 3) (0.40)3 (0.60) 0 3 2 3 0 3 P( X 0) 5 3 2 3 1 2 P( X 1) 5 3 2 3 2 1 P( X 2) 5 3 P( X 3) 0 The Hypergeometric and the Binomial distributions If the ratio of the n, the number of trials is small and N, the number in the population is large then we can approximate the hypergeometric distribution with the binomial distribution in which p = M/N. This used to be very important. As recently as five years ago computers and hand calculators could not calculate large factorials. The calculator I use can not calculate 70!. A few years ago 50! was out of the question. Despite improvements in calculators its still important to know that if the ratio of n to N is less than 5% (we are sampling 5% of the population) then we can approximate the hypergeometric distribution with the binomial. Check your calculators now -- what is the maximum factorial they can handle? Class exercise The system administrator in charge of a campus computer lab has identified 9 machines out of 80 with defective motherboards. While he is on vacation the lab is moved to a different room, in order to make room for graduate student offices. The administrator kept notes about which machines were bad, based on their location in the old lab. During the move the computers were places on their new desks in a random fashion so all of the machines must be checked again. If the administrator checks three of the machines for defects, what is the probability that one of the three will be defective? Calculate using both the hypergeometric distribution and the binomial approximation to the hypergeometric. The Geometric distribution The geometric distribution refers to the random variable represented by the number of consecutive Bernoulli trials until a success is achieved. Suppose that independent trials, each having probability p, 0 < p < 1, of being a success are performed until a success occurs. If we let X equal the number of trials prior to the success, then p(x;p)=(1-p)x p x = 1,2,... The above definition is the one used in our textbook. A more common definition is the following: Let X equal the number of trials including the last trial, which by definition is a success. Then we get the following: P(x;p)=(1-p)x-1p x = 1,2,... The geometric distribution The expected value and variance of the geometric distribution is given for the first form by: (1-p) (1-p) E(X)= , Var(x)= 2 p p For the second form, the expected value has more intuitive appeal. Can you convince yourself that the value is correct? 1 (1-p) E(X)= , Var(x)= 2 p p Please note that the variance is the same in both cases. Explain why this is so. The geometric distribution E(X)= x p( x) Trial 1 2 3 . . D Probability of first success on trial p p(1-p) 2 p(1-p) . . We can derive E(X) in the following way: E(X)=p+2pq+3pq 2 +... qE(X)=pq+2pq 2 +3pq 3 +... E(X)-qE(X)=p+pq+pq 2 ... E(X)(1-q)=1 E(X)(p)=1 E(X)= 1 p The geometric distribution Let p = 1-q and remember that E(X)= x p( x) D E(X) = (0)p(x = 0) + (1)p(x = 1) + (2)p(x = 2) + ... E(X)=p+2pq+3pq 2 +... qE(X)=pq+2pq 2 +3pq 3 +... E(X)-qE(X)=p(1+q+q 2 +q 3 ...) 1 E(X)(1-q)=p 1-q E(X)(p)=1 1 E(X)= p These steps are so that we can work with the geometric series 1+x+x2+x3+…=1/(1-x) so p(1+q+q2+…)=p(1/1-q) Here we just substitute p for 1-q The geometric distribution Try deriving the variance of the geometric distribution by finding E(X2) Poisson Distribution One of the most useful distributions for many branches of engineering is the Poisson Distribution The Poisson distribution is often used to model the number of occurrences during a given time interval or within a specified region. The time interval involved can have a variety of lengths, e.g., a second, minute, hour, day, year, and multiples thereof. Poisson processes may be temporal or spatial. The region in question can also be a line segment, an area, a volume, or some ndimensional space. Poisson processes or experiments have the following characteristics: Poisson Distribution 1. The number of outcomes occurring in any given time interval or region is independent of the number of outcomes occurring in any other disjoint time interval or region. 2. The probability of a single outcome occurring in a very short time interval or very small region is proportional to the length of the time interval or the size of the region. This value is not affected by the number of outcomes occurring outside this particular time interval or region. 3. The probability of more than one outcome occurring in a very short time interval or very small region is negligible. Taken together, the first two characteristics are known as the “memoryless” property of Poisson processes. Transportation engineers often assume that the number of vehicles passing by a particular point on a road is approximately Poisson distributed. Do you think that this model is more appropriate for a rural hiqhway or a city street? Poisson Distribution The pdf of the Poisson distribution is the following: e x P(X x) , x 0,1,2...for some >0 x! The parameter is equal to a t, where a is the intensity of the process the average number of events per time unit and t is the number of time units in question. In a spatial Poisson process a represents the average number of events per unit of space and t represents the number of spatial units in question. For example, the number of vehicles crossing a bridge in a rural area might be modeled as a Poisson process. If the average number of vehicles per hour, during the hours of 10:00 AM to 3:00 PM is 20 we might be interested in the probability that fewer than three vehicles cross on from 12:30 to 12:45 PM. In this case = (20 per hour)(0.25hours) = 5. e 5 50 e 5 51 e 5 52 P(X< 3) 0! 1! 2! 5 e 5 (1 5 ) 0.573 2 The expected value and the variance of the Poisson distribution are identical and are equal to . Class exercise An urban planner believes that the number of gas stations in an urban area is approximately Poisson distributed with parameter a = 3 per square mile. Lets assume she is correct in her assumption. Calculate the expected number of gas stations in a four square mile region of the urban area as well as the variance of this number. Calculate the probability that this region of four square miles has less than six gas stations. Calculate the probability that in four adjacent regions of one square mile each, that at least two of the four regions contains more than three gas stations. Do you think the situation is accurately modeled by a Poisson process? Why or why not? Some random variables that typically obey the Poisson probability law (Ross, p.130) The number of misprints on a page (or group of pages) of a book The number of people in a community living to 100 years of age The number of wrong telephone numbers that are dialed in a day The number of packages of dog biscuits sold in a particular store each day The number of customers entering a post office (bank, store) in a give time period The number of vacancies occurring during a year in the supreme court The number of particles discharged in a fixed period of time from some radioactive material WHAT ABOUT ENGINEERING? Poisson processes are the heart of queueing theory --which is one of the most important topics in transportation and logistics. Lots of other applications too -- water, structures, geotech etc. The Poisson Distribution as an approximation to the Binomial When n is large and p is small, the Poisson distribution with parameter np is a very good approximation to the binomial (the number of successes in n independent trials when the probability of success is equal to p for each trial). Example --Suppose that the probability that an idem produced by a certain machine will be defective is 0.10. Find the probability that a sample of 10 items will contain at most one flaw. 10 10 0 10 P( X 1) (0.10) (0.90) (0.10)1 (0.90)9 0.7361 0 1 np = 0.10*10=1.0 0 np 1 np (np) e (np) e P( X 1) 0.7357 0! 1! References Ross, S. (1988), A first course in probability, Macmillian Press.