Probability Models 3.1 Probability concepts This section is an introduction to some of the basic concepts of probability. Let's begin by considering what we mean by probability. 3.1.1 Probabilities of outcomes and events. Example 1.1. A company manufactures diodes. Based on constant testing they feel that that the probability that a diode is defective is 0.3%, i.e. (1.1) The probability that a diode is defective = 0.003, The probability that a diode is good = 0.997. What do we mean by this? One interpretation is based on frequency of occurrence. If we look at a large number of diodes, the proportion of defectives is approximately 0.003 and the proportion of good diodes is approximately 0.997. More precisely (1.2) # defective Pr{d} # diodes as # diodes , # good Pr{g} # diodes as # diodes . Here we abbreviate defective and good by d and g and "the probability that a diode is defective" by Pr{d} and similarly for good. Since the relation (1.2) involves taking the limit as # diodes , it looks like we may not be able to determine Pr{d} and Pr{g} exactly. We can estimate these values by the ratios # defective # good and # diodes # diodes using a large number of diodes. In fact, the values 0.003 and 0.997 used for Pr{d} and Pr{g} in (1.1) were probably obtained in this fashion. This is typical of real world probability models where we must use estimated values for the presumed underlying probabilities. Example 1.2. Suppose we roll a six sided die, and we say the following. The probability, Pr{ 1 }, of getting a 1 is 1/6, The probability, Pr{ 2 }, of getting a 2 is 1/6, .... The probability, Pr{ 6 }, of getting a 6 is 1/6. 3.1 - 1 Using a frequency interpretation of probability similar to example 1, if we roll the die a large number of times, then we expect that the proportion of times it comes up 1 will be approximately 1/6, and similarly for 2, 3, ..., 6. For example, if we roll the die 600 times, then we expect to get approximately 100 1's. More precisely, # times a 1 comes up Pr{ 1 } # rolls as # rolls , and similarly for 2, 3, ..., 6. Example 1.3. Each day a newsstand buys and sells The Wall Street Journal. Based on records for the past month they feel that they would never sell more than 4 copies in any day. Furthermore they feel that The probability, Pr{0}, of selling zero copies in a given day = 0.21, The probability, Pr{1}, of selling one copy in a given day = 0.26, (1.3) Pr{2} = 0.32, Pr{3} = 0.16, Pr{4} = 0.05, Here we are assuming that the newsstand has enough copies of The Wall Street Journal in stock to satisfy the demand of all the customers who want to buy it in a given day. If we look at the number of copies of The Wall Street Journal sold each day for a number of consecutive days, then the proportion of times none are sold is approximately 0.21, and similarly for 1, 2, 3, 4. More precisely (1.4) # days none are sold Pr{ 0 } # days as # days , and similarly for 1, 2, 3, 4. Summary and Terminology: We observe something. This is called an experiment in many books. For example, rolling a die and observing the face that comes up is an experiment. When we do an experiment there are certain possible things we may observe. These are called outcomes. For example, when we roll a die, the outcomes are 1, 2, 3, 4, 5, and 6. The set of all possible outcomes is called the sample space. Often we will denote the sample space by S. ( is another symbol used for the sample space in many texts.) S = {d, g} in Example 1.1, S = {1, 2, 3, 4, 5, 6} in Example 1.2 and S = {1, 2, 3, 4} in Example 1.3. For each outcome a in S one has the probability of a occurring Pr{a}. Intuitively, it is the proportion of times the outcome occurs if we repeat the same experiment independently many times, i.e. (1.5) Pr{ a } = lim n # times a occurs in n repetitions n Remarks: Even though this description of probability has a nice intuitive appeal, it has some problems if one tries to use it as a definition. One reason for this is because the notion of repeating the same 1.1 - 2 experiment independently a number of times has not been defined. If we flip a coin or roll a die repeatedly, then this is pretty close to independent repetitions of the same experiment. However if we observe the number of newspapers sold each day for a number of days, then this is probably not independent repetitions of the same experiment. Despite this problem, let us proceed on using the above concept of probability as a guide to our thoughts. In the two examples above, there were only a finite number of outcomes, i.e. S = {a1, a2, ..., an}. In this case formula (1.5) would imply n (1.6) Pr{ai} ≥ 0 for each i and Pr{ai} = 1. i=1 If we are going to use estimated or assumed values for the actual probabilities, then we should make sure that they satisfy (1.6). This is true in examples 1.1, 1.2 and 1.3. The set of values Pr{a1}, ..., Pr{an} is called the probability distribution for the outcomes of the experiment. We can p1 represent a probability distribution by the row vector p = (p1,…, pn) or column vector p = where pn pi = Pr{ai}. Later we shall see some situations where it might be advantageous to use a row vector instead of a column vector or vice versa. For example, in Example 1.3 one has p = (0.21, 0.26, 0.32, 0.16, 0.05). The components of such a vector p should be nonnegative numbers that sum to one. A vector with this property is called a probability vector. Often it is informative to make a graph of the probability distribution. Here is a graph of the probability distribution in Example 1.3. 0.3 0.25 0.2 0.15 0.1 0.05 1 2 3 4 Example 1.4. Suppose in the context of Example 1.1 we look at two diodes and observe whether each one is defective or good. An outcome consists of observing the condition of the first diode together with the condition of the second diode. For example, the first diode might be good and the second might be defective which we abbreviate by gd. Thus there are 4 possible outcomes for the two diodes, namely gg, gd, dg and dd. The sample space S = {gg, gd, dg, dd}. Suppose after some testing we arrive at the following estimates of the probabilities of these outcomes. Pr{ gg } = 0.9943 Pr{ gd } = 0.0027 1.1 - 3 Pr{ dg } = 0.0027 Pr{ dd } = 0.0003 We might still be interested in questions like "What is the probability the first diode is good?" or "What is the probability the second diode is defective?". These questions can be viewed as asking if the outcome of an experiment belongs to a certain set. For example, the first diode being good corresponds to the outcome belonging to the set E = {gg, gd}, while the second diode being defective corresponds to the outcome belonging to the set F = {gd, dd}. In probability theory, a set of outcomes is called an event. We are interested in probabilities of events as well as probabilities of individual outcomes. Using the intuitive description of probability above, we would say that the probability of a certain event E is the proportion of times the outcome lies in the set E if we repeat the same experiment independently many times, i. e. lim # times outcome is in E in n independent repetitions Pr{ E } = n n (1.7) If E = {b1, b2, ..., bm} is a finite set then it would follow from (1.5) and (1.7) that the probability of the event E is the sum of the probabilities of the outcomes in E, i.e. m (1.8) Pr{E} = Pr{bi} i=1 In Example 1.4, to find the probability that the first diode is good we use E = {gg, gd} and get Pr{E} = Pr{ {gg, gd} } = Pr{gg} + Pr{gd} = 0.9943 + 0.0027 = 0.97. Similarly Pr{ first diode is defective } = Pr{dg} + Pr{dd} = 0.0027 + 0.0003 = 0.003. Pr{ second diode is good } = Pr{gg} + Pr{dg} = 0.997. Pr{ second diode is defective } = Pr{gd} + Pr{dd} = 0.003. Example 1.5. Consider the situation in example 3. Suppose the newsstand decides to stock 2 copies of The Wall Street Journal on a certain day. What is the probability that they will have enough copies to satisfy all the customers that want to buy one? We are asking for the probability of the event E = {0, 1, 2}. This would be Pr{0} + Pr{1} + Pr{2} = 0.21 + 0.26 + 0.32 = 0.79. Additivity. Formula (1.8) is actually a special case of the following more general formula. Suppose E1, E2, ..., Em are disjoint events, i.e. no two of the sets Ei and Ej have any elements in common. Furthermore, let E denote the union of the sets E1, ..., Em, i.e. the set of all outcomes which are in one of the sets E1, ..., Em. Then 1.1 - 4 m (1.9) Pr{E} = Pr{Ei} i=1 This is called the finite additivity property of probability. Example 1.6. Suppose in the context of Example 1.2 we roll a die. Let A = {2, 4, 6} be the event of rolling an even number and B = {1, 3} be the event of rolling a 1 or a 3. Note that A and B are disjoint and AB = {1, 2, 3, 4, 6} is the event of rolling anything except a 5. We have Pr{A} = 1/2, Pr{B} = 1/3, and Pr{AB} = 5/6. The fact that Pr{AB} = Pr{A} + Pr{B} is a special case of the above formula. There are three other properties of probability which would follow from our informal definition. (1.10) 0 Pr{E} 1, (1.11) Pr{S} = 1, for any event E, Pr{ } = 0 where S is the entire sample space and is the empty set. Formula (1.10) is called the non-negativity property of probability and formula (1.11) is sometimes called the normalization axiom. Problem 1.1. Office Max keeps a certain number of staplers on hand. If they sell out on a certain day, they order 6 more from the distributor and these are delivered in time for the start of the next day. Thus the inventory at the start of a day can be 1, 2, 3, 4, 5, or 6. Based on records for the past two months they feel that the probability, Pr{1}, of there being 1 stapler at the start of a day = 0.09, Pr{2} = 0.21, Pr{3} = 0.29, Pr{4} = 0.23, Pr{5} = 0.12 and Pr{6} = 0.06. (a) What is the sample space S? Ans: S = {1, 2, 3, 4, 5, 6} (b) Does this model satisfy condition (1.4)? Ans: Yes. (c) What is the probability that there is at least 3 staplers at the start of the day? What event are we talking about? Ans: The event is E = {3, 4, 5, 6} and Pr{E} = 0.7. Problem 1.2. Using (1.7) show that Pr{AB} = Pr{A} + Pr{B} - Pr{AB} when A and B are not necessarily disjoint. 3.1.2 Random variables. Let's return to Example 1.4 where we look at two diodes and observe whether each one is defective (d) or good (g). Previously we described that situation by considering the sample space S = {gg, gd, dg, dd}. There is a more common way of describing this situation instead of explicitly listing the elements of the sample space. This involves what are called random variables. Superficially we choose letters for the result of each of the individual diodes. For example, we might let X1 = result of first transisitor, X2 = result of second diode. 1.1 - 5 Thus X1(gg) = g X2(gg) = g X1(gd) = g X2(gd) = d X1(dg) = d X2(dg) = g X1(dd) = d X2(dd) = d Note that gd is just a short way of writing the order pair (g, d) and similarly for gg, dg and dd. With that in mind, note that X1(i,j) = i and X2(i,j) = j where i and j represent either g or d. In fact, the outcome (i, j) of the experiment is determined by the values of X1 and X2 since (i, j) = (X1(i,j), X2(i,j)). X1 and X2 are examples of random variables. In general, if we have an experiment with a sample space S, then a random variable X is a function defined on S. Thus a random variable X is a function which assigns to each outcome a a value X(a). For example X1 assigns to the outcome gd the value g. Many probability questions are stated most easily using random variables. The outcome of an experiment itself is a random variable. It corresponds to the function which assigns to each outcome a value equal to the outcome itself. So, in a sense, the notion of a random variable includes that of an experiment. More generally, there is often a collection of random variables whose values taken together specify the outcome of the experiment. As noted in the example above the outcome of the experiment is just (X1(i,j), X2(i,j)). Often books simply refer to the experiment by giving one or more random variables without specifically mentioning what the actual outcomes are. The following notation is quite common. If X is a random variable and x is a value, then one can consider the set of outcomes to which X assigns the value x. This set of outcomes could be denoted by {a: X(a) = x}, but more commonly it is denoted simply by {X = x}. The probability of this event is usually denoted by Pr{X = x} instead of the more lengthy Pr{a: X(a) = x}. It is the probability that the random variable assumes the value x. Thus, the probability the first diode is good is denoted by Pr{X1 = g}. In the discussion of Example 1.4 above we saw above that Pr{X1 = g} = 0.997. Most books tend to use capital letters for random variables and lower case letters for the values they may assume. Example 1.7 (Two rolls of a die). We roll a die twice (or roll two dice together). The outcome consists of observing the result of the first roll together with the result of the second roll. For example, we might get a 3 on the first roll and a 5 on the second. Let us indicate this by writing (3, 5). Thus the sample space consists of the following 36 possible outcomes for the two rolls. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) (3,1) (3,2) (3,3) (3,4) (3,5) (3,6) 1.1 - 6 (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) Instead of listing these 36 elements in the sample space we could equally well describe the experiment by the defining the two random variables X1 = result of first roll, X2 = result of second roll. For example X1(3,4) = 3 and X2(3,4) = 4. In general X1(i,j) = i and X2(i,j) = j. As with the example of the two diodes, the outcome (i, j) of the experiment is determined by the values of X1 and X2 since (i, j) = (X1(i,j), X2(i,j)). Suppose, based on some testing, we feel that the probability of each outcome is 1/36, i.e. the outcomes are equally likely. The probability the first roll is a 3 is Pr{X1 = 3} = Pr{ (3, 1) } + Pr{ (3, 2) } + Pr{ (3, 2) } + Pr{ (3, 4) } + Pr{ (3, 5) } + Pr{ (3, 6) } 1 1 1 1 1 1 1 = 36 + 36 + 36 + 36 + 36 + 36 = 6 In the same way one can show that Pr{X1 = i} = 1/6 and Pr{X2 = j} = 1/6 for all i and j. More generally, if A is a set, then one can consider those outcomes for which X assumes a value in A. This set, {a: X(a) A}, is usually denoted by {X A}, and its probability is denoted by Pr{X A}. For example, in Example 1.7 if A = {2, 3, 4}, then {X1 {2, 3, 4} } is the event that the first roll is a 2, 3, or 4 and Pr{X1 {2, 3, 4} } = ½. One can think of a random variable as defining an experiment with less detail than the original experiment, i.e. we temporarily ignore all aspects of the original experiment except the value of the random variable. Example 1.8. In the context of Example 1.6 where we roll a die twice suppose we have a situation where the sum of the two rolls is important. For example, we are playing a game where we roll two dice, find the sum, and move our piece that many squares forward. Let T be the sum of the two rolls. T is another random variable and one has T = X1 + X2. We are interested in questions like "What is the probability that T is 7?". More generally we want to know the probability that T is n, i.e. Pr{ T = n }. The sum being 7 is just the event consiting of the outcomes (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1). Therefore Pr{ T = 7 } = 6/36 = 1/6. In this fashion we can compute Pr{ T = n } for any n. In fact, we get Pr{ T = n } = f(n), where n 2 3 4 5 6 7 8 9 10 11 12 __________________________________________________________ 1 2 3 4 5 6 5 4 3 2 1 f(n) 36 36 36 36 36 36 36 36 36 36 36 1.1 - 7 Probability mass functions. Suppose we have a random variable X that takes on the set of values {x1, x2, ..., xn}. Let f(xi) = Pr{ X = xi } be the probability that X takes on the value xi for each i. This function f(x) is called the probability mass function of the random variable. Another way to represent the probability mass function is by means of a probability vector p = (p1,…, pn) where pi = Pr{ X = xi }. For example, the table of values of n and f(n) above is the probability distribution of the random variable T. Alternatively one could say that f(n) = (6 - |n – 7|)/36 for n = 2, 3, …, 12. Here is a graph of f(n) = (6 - |n - 7|)/36. 0.15 0.125 0.1 0.075 0.05 0.025 2 4 6 8 10 12 Example 1.8. Suppose in the context of Example 1.1 we look at three diodes. There are eight possible outcomes for the three diodes, i.e. ggg, ggd, gdg, gdd, dgg, dgd, ddg and ddd. For example, gdg means that the first and third diodes selected are acceptable and the second is defective. Based on past exerience he assigns the following probabilities to each of the outcomes. Pr{ ggg } = (0.999)3 Pr{ ggd } = Pr{ gdg } = Pr{ dgg} = (0.999)2(0.001) Pr{ gdd } = Pr{ dgd } = Pr{ ddg} = (0.999)(0.001)2 Pr{ ddd } = (0.001)3 Let X1 = condition of diode on the first day, either g or d, X2 = condition of diode on the second day, X3 = condition of diode on the third day, N = number of diodes in the batch of three that are defective. For example X1(dgd) = d, X(dgd) = g, X3(dgd) = d, and N(dgd) = 2. In general X1(i1,i2,i3) = i1, X2(i1,i2,i3) = i2, X3(i1,i2,i3) = i3, and N(i1,i2,i3) = the number of values of j such that ij is equal to d. The outcome (i1,i2,i3) of the experiment is determined by the values of X1, X2 and X3 since (i1,i2,i3) = ( X1(i1,i2,i3), X2(i1,i2,i3), X3(i1,i2,i3) ). Note: If we were to denote acceptable by 0 and defective by 1 then N would be related to X1, X2 and X3 by N = X1 + X2 + X3. 1.1 - 8 Problem 1.3. Find the probability mass functions of X1, X2, X3 and N. Example 1.9. The administrative assistant of the accounting department is going to observe the condition of the department's copier at the start of each day on two successive days. He will record whether it is either in good condition, g, poor condition, p, or broken, b, for each of the two days. Thus, there are nine outcomes for his observations over the two day period, i.e. gg, gp, gb, pg, pp, pb, bg, bp and bb. For example, gp means that the copier is in good condition the first day and in poor condition the second day. Let X1 = state of the copier on the first day, either g, p or b, X2 = state of the copier on the second day, For example X1(gp) = g and X2(gp) = p. In general X1(i1,i2) = i1, and X2(i1,i2) = i2. The outcome (i1,i2) of the experiment is determined by the values of X1 and X2 since (i1,i2) = ( X1(i1,i2), X2(i1,i2) ). 3.1.3 Conditional probability and independence. Conditional probability. Conditional probabilities are a way to adjust the probability of something happening according to how much information one has. Example 1.10. Suppose in the context of Example 1.4 take two diode. Using the probabilities in that example, we saw that the probability that the second diode is defective is 0.003. However, suppose we test the first diode and we find that it is defective. Does this affect the probability that the second diode is defective? Here is one possible way of looking at this. We take a large number of pairs of diodes and test both diodes in each pair. We ignore any pairs where the first diode is good. Of those which have the first diode defective, we count the number in which the second diode is also defective. The fraction (3.1) # which both are defective # which the first is defective is approximately the probability that the second diode is defective after we observe that the first diode is defective. Note that (3.2) # which both are defective total # of pairs # which both are defective = # which the first is defective # which the first is defective total # of pairs Pr{both are defective} 0.0003 = = 0.1, Pr{first is defective} 0.003 as the number of rolls . Thus it appears that the observation that the first diode is defective does affect the probability that the second diode is defective. This is an example of conditional probabilities. If A and B are two events, then we are interested in the probability that the outcome is in A given that the outcome is in B. This is called the conditional 1.1 - 9 probability of A given B and is denoted by Pr{A | B}. Intuitively, we may interpret this as meaning the following. We perform the experiment a large number of times with each one independent of the others. We ignore any times the outcome is not in B, and of the times the outcome is in B, we count the number of times the outcome is also in A. Then (3.3) # times outcome is in A B Pr{ A | B }, # times outcome is in B as the number of repetitions . Note that if we divide the top and bottom of the fraction in (3.3) by the total number of times we do the experiment, then we get (3.4) # times outcome is in A B # repetitions # times outcome is in B # repetitions Pr{ A | B }, The fraction in the top of (3.4) approaches Pr{A B } and the fraction in the bottom approaches Pr{B} as the number of times we do the experiment .. So we have (3.5) Pr{A | B} = Pr{A B} , Pr{B} provided Pr{B} is not 0. Most texts take this as the definition of Pr{A | B}. In Example 1.10 Pr{ the second diode is defective | the first is defective } = 0.1. Example 1.11. Suppose in the context of Example 1.2 we roll a die. What is the conditional probability that the number coming up is even given that the number coming up is 4 or larger? Here we are asking for Pr{A | B) where A = = {2, 4, 6} is the event that we get an even number and B = {4, 5, 6} is the event that the number is 4 or larger. Using (3.5) we Pr{A B} 1/3 have Pr{A | B} = = = 2/3. Pr{B} 1/2 Problem 1.4. You own stock in Megabyte Computer Corporation. You estimate that there is an 80% chance of Megabyte making a profit if it makes a certain technological breakthrough, but only a 30% chance of making a profit if they don't make the breakthrough. Furthermore, you estimate that there is a 40% chance of its making the breakthrough. Suppose before you can find out if they made the breakthrough, you go on a 6 month vacation to Tahiti. Then one day you receive the following message from your stockbroker. "Megabyte made a profit." What is the probability that Megabyte made the breakthrough? Ans: .64 Independent events. Earlier we used the term independent in an informal way to indicate repetitions of an experiment which are in some way unrelated. In most books the notion of independence is defined in terms of probability instead of vice versa. This is done as follows. 1.1 - 10 We say than an event A is independent of another event B if the probability that the outcome is in A is the same as the probability that the outcome is in A given that the outcome lies in B. In symbols (3.5) Pr{A | B} = Pr{A}. Thus the knowledge that B has occurred doesn't give one any information regarding the probability that A has occurred. Example 1.12. In Example 1.4 where we two took diodes the second diode being defective is not independent of the first diode being defective. This is because Pr{ the second diode is defective | the first is defective } = 0.1 (as we saw above) but Pr{ the second diode is defective } = 0.003 (as we saw in connection with Example 1.4). Example 1.13. In Example 5 getting an even number is not independent of getting 4 or larger since Pr{even | 4 or larger } = 2/3, while Pr{even} = 1/2. On the other hand if C = {3, 4, 5, 6} is the event of getting 3 or larger then it is not hard to show that Pr{A | C} = 1/2 so that getting an even number is independent of getting 3 or larger. Since Pr{A | B} = Pr{A B}/Pr{B}, the formula (1.2.7) that A be independent of B can be stated as Pr{A B} = Pr{A} Pr{B}. (3.6) Thus, A is independent of B if the probability of both A and B occurring is the product of their probabilities. One consequence of this is that if A is independent of B, then B is independent of A, i.e. the definition is symmetric in the two sets A and B. The formula (3.6) makes sense even if Pr{B} is 0, while the original formula (3.5) does not. Most texts use (3.6) as the basic definition of independent events. Problem 1.5. a) You roll a die twice as in the above example. Consider the event where the sum of the numbers on the two rolls is 7. Is this independent of rolling a 1 on the first roll? Ans: Yes. b) Let B be the event of rolling a 1 on the first roll or second roll or both. Is event where the sum of the numbers on the two rolls is 7 independent of B? Ans: No. Independent random variables: two random variables. In Example 1.12 we saw that the second diode being defective is not independent of the first diode being defective. However, in Example 1.7 it is not hard to see that rolling a 3 on the second roll is independent of rolling a 5 on the first roll, i.e. the fact that we know that a 5 turned up on the first roll doesn't change the probability that a 3 will come up on the second roll. Let {X1 = 5} denote the event that the result of the first roll is a 5 and {X2 = 3} be the event that the result of the second roll is a 3 and {X1 = 5, X2 = 3} be the event that the first roll is a 5 and the second roll is a 3. One has {X2 = 3} = { (1,3), (2,3), (3,3), (4,3), (5,3), (6,3) } 1.1 - 11 {X1 = 5} = { (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) } {X1 = 5, X2 = 3} = { (5,3) }. Therefore, Pr{X2 = 3} = 1/6, Pr{X1 = 5} = 1/6, Pr{X1 = 5, X2 = 3} = 1/36. So, Pr{X2 = 3 | X1 = 5} = 1/36 = 1/6, which is the same as Pr{X2 = 3}. This shows rolling a 3 on the second 1/6 roll is independent of rolling a 5 on the first roll. The same argument shows that the probability of rolling any given number on the second roll is independent of rolling any other given number on the first roll, i.e. (3.7) Pr{X1 = i, X2 = j} = Pr{X1 = i} Pr{X2 = j}. This leads us to the notion of independent random variables. Let X and Y be two random variables. X and Y are independent if knowledge of the values of one of them doesn't influence the probability that the other assumes various values. If X and Y only take on a finite or countably infinite number of values and have probability mass functions f(x) and g(y) respectively, then this is equivalent to (3.8) Pr{X = x, Y = y} = Pr{X = x} Pr{Y = y} = f(x) g(y) for any x and y. For example, X1 and X2 in Example 1.7 are independent. However, if X1 and X2 are the condition of the first and second diode in Example 1.4, then they are not independent. Example 1.14. Consider the situation in Example 1.4 where we two took diodes. Let X1 and X2 be the condition of the first and second diode. Suppose the probability mass functions of are given by f(g) = Pr{ X1 = g } = 0.997 f(d) = Pr{ X1 = d } = 0.003 h(g) = Pr{ X2 = g } = 0.997 h(d) = Pr{ X2 = d } = 0.003 Furthermore, suppose X1 and X2 are independent. Find the probabilities of the four outcomes gg, gd, dg and dd. Using the independence one has Pr{ gg } = Pr{ X1 = g and X2 = g } = Pr{ X1 = g } Pr{ X2 = g } = (0.997)(0.997) = 0.994009 Pr{ gd } = Pr{ X1 = g and X2 = d } = Pr{ X1 = g } Pr{ X2 = d } = (0.997)(0.003) = 0.002991 Pr{ dg } = Pr{ X1 = d and X2 = g } = Pr{ X1 = d } Pr{ X2 = g } = (0.003)(0.997) = 0.002991 Pr{ dd } = Pr{ X1 = d and X2 = d } = Pr{ X1 = d } Pr{ X2 = d } = (0.003)(0.003) = 0.000009 This illustrates a common situation where we describe a probability model by giving one or more random variables along with their probability mass functions together with some information, such as independence, about the joint behavior of the random variables. We do this instead of listing the elements in the sample space along with their probabilities. Independent random variables: several random variables. More generally, if we have a collection of random variables, X1, ..., Xn, then they are independent if knowledge of the values of some of the variables 1.1 - 12 doesn't change the probability that the others assume various values. If they only take on a finite or countably infinite number of values and have probability mass functions fi(x), then this is equivalent to, i.e. Pr{ X1 = x1, ..., Xn = xn } = Pr{X1 = x1} ... Pr{Xn = xn} = f1(x1) ... fn(xn) (3.9) for any x1, ..., xn. Example 1.15. Consider a situation similar to Example 1.4, except we take n diodes. Let Xi be the condition of the ith diode for i = 1, 2, …, n. Suppose all the Xi have the same probability mass function given by fi(g) = Pr{ Xi = g } = 0.997 fi(d) = Pr{ Xi = d } = 0.003 Furthermore, suppose all the Xi are independent. Find the probability that all n diodes are good. Using the independence one has Pr{ gg…g } = Pr{ X1 = g and X2 = g and … and Xn = g } = Pr{ X1 = g } Pr{ X2 = g } … Pr{ Xn = g } = (0.997) (0.997) … (0.997) = (0.997)n Problem 1.6. A store sells two types of tables: plain and deluxe. When a customer buys a table, there is an 80% chance that it will be a plain table. Assume that whether or not the type of table that one customer buys is independent of the types of the tables that any other customer buys. a. On a certain day five tables are sold. What is the probability that they will all be plain? b. Suppose that on each of the days Monday, Tuesday, and Wednesday five tables are sold. What is the probability that all the tables sold on Monday and Tuesday are plain and that on Wednesday at least one deluxe table is sold? 1.1 - 13