1 Chapter 4 – Probability Distributions Defn: A random variable is a real-valued function whose domain is the sample space of a random experiment. A random variable is called discrete if it has either a finite or a countably infinite number of possible values. If the number of possible values is uncountably infinite, then the random variable is called continuous. Example 1: If the random experiment is to flip a fair coin twice, the sample space is š = {š»š», š»š, šš», šš}. Define a random variable X = number of heads that occur when I flip a fair coin twice. This is an example of a discrete random variable. Then š(š»š») = 2, š(š»š) = 1, š(šš») = 1, and š(šš) = 0. Note: We will discuss continuous random variables in Chapter 5. In Chapter 4, we discuss discrete distributions, including certain useful families of discrete distributions. Defn: The probability distribution of a discrete random variable X is a set of ordered pairs of numbers. In each pair, the first number is a possible value, x, of X, and the second number is the probability that the observed value of X will be x when we perform the random experiment. We can represent the distribution either as a function or (if the number of possible values of X is relatively small) as a table. As a function, we would write š(š„) = š(š = š„), for all possible values of X. The probability distribution must satisfy the following two conditions: š) š(š„) ≥ 0, ššš ššš š„, ššš šš) ∑ š(š„) = 1. ššš š„ The function f is also called the probability mass function (p.m.f.) of the random variable. Example : Continuing the above example, we can find the probabilities associated with X by considering the probabilities associated with the outcomes in the sample space. Since we assume the coin is fair, each outcome in the sample space is equally likely to occur. Therefore, we may represent the probability distribution of X in the following table: x š(š„) = š(š = š„) 0 0.25 1 0.50 2 0.25 The conditions above translate into saying that every number in the second column must be between 0 and 1, and the sum of the second column must be 1. Defn: The cumulative distribution function (c.d.f.) (or distribution function) for a random variable X is defined by š¹(š„) = š(š ≤ š„), −∞ < š„ ≤ ∞. Note that the c.d.f. is defined for all real values of x, even though the random variable is discrete. 2 Example: Continuing the example with flipping a fair coin twice. We want to find the c.d.f. and construct a graph. The graph will be a step function. Bernoulli Distribution The simplest type of discrete distribution is one for which the r.v. has two possible values. Defn: A discrete r.v. X is said to have a Bernoulli distribution with parameter p (X ~ Bernoulli(p)) if there are exactly two possible values 0 and 1 of X, such that P(X = 1) = p, and P(X = 0) = 1 – p. Example: Our random experiment is to flip a fair coin once. We define X = number of heads. Then X ~ Bernoulli(0.5). Then P(X = 1) = 0.5, and P(X = 0) = 0.5 Binomial Distribution Assume that instead of flipping the fair coin once or twice, we flip it 10 times. The sample space of the experiment has 1024 possible outcomes. The number of events that could be defined is then 21024 . These numbers are rather unwieldy to work with directly. Hence we define a random variable X to be the number of heads that occur when we flip a fair coin 10 times. We want to be able to calculate probabilities associated with X. Defn: A discrete r.v. X is said to have a binomial distribution with parameters n and p if ļ¦nļ¶ nļ x P ļØ X ļ½ x ļ© ļ½ ļ§ ļ· p x ļØ1 ļ p ļ© , for x = 0, 1, …, n. ļØ xļø Derivation of the Binomial Distribution: A binomial experiment is a random experiment which satisfies the following conditions: 1) The experiment consists of a fixed number, n, of trials. 2) The trials are identical to each other, in that they are all performed the same way. 3) The trials are independent of each other, meaning that the outcome of one trial gives us no information about the outcome of any other trial. 4) Each trial has two possible outcomes, which we will call Success and Failure. 5) The probability of Success is the same, p, for each of the trials. Note that the author states only three conditions. Please use the five conditions listed above when checking to see whether a random experiment is a binomial experiment. We let X = # of Successes in the n trials. The possible values of X are 0, 1, 2, 3, …, n. For a given x ļ {0, 1, 2, …, n}, what is P(X = x)? One way that we can have exactly x Successes out of n trials is for the first x trials to result in Success and the remaining n – x trials to result in failure. If the trials are independent (the outcome of one trial is unrelated to the outcome of any other trial), then P(x Successes followed by n-x Failures) = p x ļØ1 ļ p ļ© nļ x . 3 Any other ordering of x Successes and n – x Failures will have the same probability of occurring. How many such orderings are there? Defn: Given a set of n objects, the number of ways to choose a subset of x of the objects is given by the binomial coefficient: ļ¦nļ¶ n! ļ§ ļ·ļ½ ļØ x ļø x !ļØ n ļ x ļ© ! . The number of different orderings of x Successes and n – x Failures is the same as the number of ways of choosing x of the n trials to be Successes. Hence, the probability that there will be exactly x Successes in n Bernoulli trials is given by: ļ¦nļ¶ nļ x P ļØ X ļ½ x ļ© ļ½ ļ§ ļ· p x ļØ1 ļ p ļ© , ļØ xļø or x = 0, 1, …, n. Example: Let’s go back to our random experiment of flipping a fair coin 10 times. Let X = number of heads that occur. Does this satisfy the conditions of being a binomial experiment? ļ¦10 ļ¶ 5 5 P ļØ X ļ½ 5ļ© ļ½ ļ§ ļ· ļØ 0.5 ļ© ļØ 0.5 ļ© ļ½ 0.24609375 ļØ5 ļø We have . 5 ļ¦10 ļ¶ x 10 ļ x ļ½ 0.6230 . Clearly the calculations can What about P(X ļ£ 5)? P ļØ X ļ£ 5 ļ© ļ½ ļ„ ļ§ ļ· ļØ 0.5 ļ© ļØ 0.5ļ© x ļ½0 ļØ x ļø become tedious. To find binomial probabilities using Excel: If X ~ binomial(n, p), and we want to find P(X ļ£ x), then in the cell of the worksheet, enter =BINOMDIST(x, n, p, TRUE). In our example, we want to find P(X ļ£ 5). In cell A1, we enter =BINOMDIST(5,10,0.5,TRUE) We get 0.6230. If we want P(X = 5), we enter =BINOMDIST(5,10,0.5,TRUE) – BINOMDIST(4,10,0.5,TRUE) We get 0.2461. If we want P(X > 5), we enter =1-BINOMDIST(5,10,0.5,TRUE) We get 0.3770. To find binomial probabilities using the TI-83/84 calculator: If X ~ binomial(n, p), and we want to find P(X = x) or P(X ≤ x), refer to the calculator handout. Examples of Binomial Experiments: 1) Assume that the date is October 15, 2012. We want to predict the outcome of the Presidential election. We will assume, for simplicity, that there are only two candidates, President Barack Obama 4 and former Governor Sarah Palin. We select a simple random sample of n = 1068 voters from the population of all eligible, registered, and likely voters. We ask each voter in the sample, “Do you intend to vote for President Obama?” Let X = number of voters in the sample who plan to vote to reelect President Obama. Is this a binomial experiment? We need to check to see whether each of the five conditions is satisfied. 2) A worn machine tool produces 1% defective parts. We select a simple random sample of 25 parts produced by this machine, and let X = number of defective parts in the sample. 3) I give a pop quiz to the class consisting of 10 multiple choice questions, each with four possible responses, only one of which is the correct response. A student has been goofing off all semester, and comes to class totally unprepared for the quiz. He decides to randomly guess the answer to each question. Let X = his score on the quiz. 4) It is known that of the entire population of adults in Florida, 5% have a certain blood type. We select a random sample of Florida and obtain blood samples to test. Let X = number of people in the sample who have the blood type. Example: p. 88, first example The Hypergeometric Distribution Consider the following situation. You are working in quality control for a manufacturer of PC’s. Your company has just received a large shipment (of size N) of identical electronic components that are to be used in producing a particular type of PC. Your company has a policy that, if the proportion of defective components in a shipment exceeds 2%, the shipment will not be accepted. It is impractical to test each of the N components for defects. Hence, you will select a random sample of size n from the shipment and test each component in the sample for defects. If the number of defective components in the shipment is a, how likely is it that you will discover a certain number, x, of defective components in the sample? You will select the sample from the shipment in such a way that, after each component is selected, each of the remaining components in the shipment has an equal chance of being selected next. This is an exercise in sampling without replacement. If we replace each component selected before selecting the next, so that a component could possibly be selected more than once, then we are doing sampling with replacement, and the binomial distribution would be the correct distribution to use. The total number of ways of selecting a sample of size n from the shipment is given by š ( ). š If there are a defective components in the shipment and N – a non-defective components, then the number of ways of choosing a sample of size n and obtaining x defective components and n – x nondefective components is, using our counting rules, š š−š ( )( ). š„ š−š„ Defn: A discrete random variable X is said to have a hypergeometric distribution with parameters N, a, and n if the p.m.f. of X is given by 5 š š−š ( )( ) š„ š−š„ š(š„) = š(š = š„) = , š ( ) š for x = 0, 1, 2, 3, …, n, subject to the conditions that x cannot exceed a and n-x cannot exceed N-a. Note: If the sample size is much smaller than the population size, then we may approximate the š Hypergeometric(N, a, n) distribution with a Binomial(n, š) distribution. (In fact, in almost all cases in which a binomial distribution is used, we are approximating a hypergeometric. The binomial distribution is mathematically more tractable than the hypergeometric distribution.) Example: p. 90, at top. Mean and Variance of a Discrete Distribution n Defn: The mean, or expectation, or expected value, of a discrete r.v. X is given by ļ ļ½ ļ„ xi f ļØ xi ļ© . i ļ½1 Defn: The variance of a discrete r.v. X is given by n ļ³ 2 ļ½ ļ„ ļØ xi ļ ļ ļ© f ļØ xi ļ© 2 i ļ½1 . The standard deviation of X is just the square root of the variance. Note: It is generally easier to calculate the variance using the equivalent formula n ļ³ ļ½ ļ„ xi2 f ļØ xi ļ© ļ ļ 2 2 i ļ½1 . Example: The random experiment is to flip a fair coin twice. Let X = number of heads. We found the distribution of X earlier. We want to find the expected number of heads, and the variance and standard deviation of X. First, the mean is given by 2 š = ∑ š„š(š„) = (0)(0.25) + (1)(0.50) + (2)(0.25) = 1. š„=0 After we have calculated the mean, we need to calculate the second moment of the distribution 2 ∑ š„ 2 š(š„) = (0)(0.25) + (1)(0.50) + (4)(0.25) = 1.5. š„=0 If we subtract the square of the mean from the second moment, we obtain the variance: 2 š 2 = ∑ š„ 2 š(š„) − š 2 = 1.5 − 1 = 0.50, š„=0 and the standard deviation 2 š = √∑ š„ 2 š(š„) − š 2 = √0.50 = 0.7071. š„=0 6 Interpretation of the mean of the distribution: The random experiment is to flip a fair coin twice and count the number of heads that occur. If we perform this experiment repeatedly, very many times the average of the counts obtained will get closer and closer to 1. The above example is a special case of the binomial distribution. If X ~ Binomial(n, p), then š = šø[š] = šš. and š 2 = ššš(š) = šš(1 − š). Derivation of the mean of a binomial distribution: By definition, š š š = ∑ š„š(š„) = ∑ š„ š„=0 š„=0 The first term in the sum is 0, giving š š = ∑š„ š„=1 š! š š„ (1 − š)š−š„ . š„! (š − š„)! š! š š„ (1 − š)š−š„ . š„! (š − š„)! We will factor out n and p from the sum, and use the fact that š„ 1 = , š„! (š„ − 1)! Giving š (š − 1)! š = šš ∑ š š„−1 (1 − š)(š−1)−(š„−1) . (š„ − 1)! ((š − 1) − (š„ − 1))! š„=1 Next, we change the variable to z = x – 1. We find š−1 (š − 1)! š š§ (1 − š)(š−1)−š§ . š§! ((š − 1) − š§)! š§=0 But the sum is just the sum of probabilities, over all possible values of a random variable Z that has a Binomial(n-1, p) distribution. Hence, the sum is 1, and we have š = šš. Derivation of the variance (using the same technique) is left as a exercise. š = šš ∑ Example 1: Assume that the date is October 15, 2012. We want to predict the outcome of the Presidential election. We will assume, for simplicity, that there are only two candidates, President Barack Obama and former Governor Sarah Palin. We select a simple random sample of n = 1068 voters from the population of all eligible, registered, and likely voters. We ask each voter in the sample, “Do you intend to vote for President Obama?” Let X = number of voters in the sample who plan to vote to re-elect President Obama. The expected number of voters in the sample who will vote to re-elect President Obama is š = šš = (1068)(š). If his level of support in the 2012 election were the same as in the 2008 election, then we would expect that š = šš = (1068)(0.53) = 566.04 voters in the sample would vote to re-elect the President. The standard deviation of the distribution would be š = √šš(1 − š) = √(1068)(0.53)(0.47) = 16.3107. 7 Example 2: A worn machine tool produces 1% defective parts. We select a simple random sample of 25 parts produced by this machine, and let X = number of defective parts in the sample. X ~ Binomial(n = 25, p = 0.01). Therefore, the expected number of defective parts in the sample would be š = 0.25, and the standard deviation of X would be š = √(25)(0.01)(0.99) = 0.4975. What if we selected such a sample and found that there were 2 defective parts in the sample? The value x = 2 is about 3.5176 standard deviations greater than the expected value under the assumption that the defect rate is 1%. We would then conclude that the actual defect rate is likely to be higher than 1% (assuming that the sampling was done randomly). Let the discrete random variable X ~ Hypergeometric(N, a, n). Then the mean of the distribution is š š=š , š and the variance of the distribution is š š š−š š 2 = š ( ) (1 − ) ( ). š š š−1 Do these formulae look familiar? Example: p. 98. Defn: Let X be a discrete random variable with p.m.f. f(x). We define the kth moment about the origin to be šš′ = ∑ š„ š š(š„). ššš š„ We also define the kth central moment (or the kth moment about the mean) as šš = ∑ (š„ − š)š š(š„). ššš š„ The first moment about the origin is just the mean of the distribution. The second central moment is the variance of the distribution. Third moments are related to the skewness of the distribution (see page 89). Chebyshev’s (aka Tchebychev’s)Theorem Theorem 4.1: If a probability distribution has mean µ and standard deviation σ < +∞, then for any k ≥ 1 1, the probability of obtaining a value of X that deviates from µ by at least kσ is at most š 2 . Symbolically, we write š(|š − š| ≥ šš) ≤ 1 . š2 Equivalently, we can say that š(|š − š| < šš) ≥ 1 − 1 . š2 Example: A worn machine tool produces 1% defective parts. We select a simple random sample of 25 parts produced by this machine, and let X = number of defective parts in the sample. 8 X ~ Binomial(n = 25, p = 0.01). Therefore, the expected number of defective parts in the sample would be š = 0.25, and the standard deviation of X would be š = √(25)(0.01)(0.99) = 0.4975. Let k = 2. Then we find that 1 š(|š − š| < šš) = š(š − 2š < š < š + 2š) ≥ 1 − = 0.75. 4 Assuming that the 1% defect rate is true, the probability that the measure value of X will differ from the expected count, š = 0.25, by no more than 2š = 0.995, is at least 75%. Let k = 3. Then we find that 1 š(|š − š| < šš) = š(š − 3š < š < š + 3š) ≥ 1 − = 0.8889. 9 Assuming that the 1% defect rate is true, the probability that the measure value of X will differ from the expected count, š = 0.25, by no more than 3š = 0.995, is at least 88.89%. Poisson Distribution This distribution provides the model for the occurrence of rare events over a period of time, distance, or some dimension. Examples: 1) X = number of cars driving through an intersection in an hour. 2) X = number of accidents occurring at an intersection in a year. 3) X = number of alpha particles emitted by a sample of U-238 over a period of time. The common characteristics of Poisson processes are these: We divide the interval of time (distance, etc.) into a large number of equal subintervals. 1) The probability of occurrence of more than one count in a small subinterval is 0; 2) The probability of occurrence of one count in a small subinterval is the same for all equal subintervals, and is proportional to the length of the subinterval; 3) The count in each small subinterval is independent of other subintervals. We let X = count of occurrences in the entire interval. Defn: A discrete r.v. X is said to have a Poisson distribution with mean ļ¬ if the p.m.f. of the distribution is eļ ļ¬ ļ¬ x f ļØ xļ© ļ½ , for x = 0, 1, 2, 3, …. x! The mean and variance of the distribution are ļ ļ½ E ļ X ļ ļ½ ļ¬ and V(X) = ļ¬. Note: We may derive the Poisson distribution as a limiting case of the binomial distribution with the number of trials going to infinity and the probability of success on each trial going to 0 in such a way that the mean of the distribution remains constant. This is done in the textbook. Here it is: Example 1: The number of cracks in a section of interstate highway that are significant enough to require repair is assumed to follow a Poisson distribution with a mean of 2 cracks per mile. First, does this situation actually satisfy the Poisson conditions? a) What is the probability that there are no cracks that require repair in a 5-mile section of highway? 9 b) What is the probability that at least one crack requires repair in a ½ mile section of highway? Example 2: Contamination is a problem in the manufacture of optical strorage disks. The number of particles of contamination that occur on an optical1 disk has a Poisson distribution, and the average number of particles per square centimeter of media surface is 0.1. The area of a disk under study is 100 square centimeters. Find the probability that 12 particles occur in the area under study. Poisson Processes Any random process that satisfies the Poisson conditions is called a Poisson process. Example: p. 107. Example: p. 108. Another example occurs in the insurance industry. Assume that an insurance company has a large number of policyholders who have a certain type of auto-insurance policy. Under certain general assumptions, accidents occur for policyholders according to a Poisson process with an expected value of λ. Hence, claims for accidents also are modeled by a Poisson process. The Geometric Distribution Let’s go back to the binomial experiment, and relax one of the conditions of the experiment, i.e., we no longer require that the number of trials in the experiment be fixed. We still require that the trials be performed identically, that they be independent of each other, that each trial have two possible outcomes (Success or Failure), and that the probability of Success be the same, p, for each trial. Let X = the number of trials that must be performed to encounter the first success. Defn: A discrete random variable X is said to have a Geometric(p) distribution if its p.m.f. is given by š(š„) = š(š = š„) = š(1 − š)š„−1 , for x = 1, 2, 3, …. 1 The mean of the Geometric(p) distribution is š = š. The variance of the distribution is š 2 = Example: p. 110, Exercise 4.62 1−š š2 .