3.1 Probability concepts

advertisement
Probability Models
3.1 Probability concepts
This section is an introduction to some of the basic concepts of probability. Let's begin by considering what
we mean by probability.
3.1.1 Probabilities of outcomes and events.
Example 1.1. A company manufactures diodes. Based on constant testing they feel that that the
probability that a diode is defective is 0.3%, i.e.
(1.1)
The probability that a diode is defective = 0.003,
The probability that a diode is good = 0.997.
What do we mean by this? One interpretation is based on frequency of occurrence. If we look at a large
number of diodes, the proportion of defectives is approximately 0.003 and the proportion of good diodes is
approximately 0.997. More precisely
(1.2)
# defective
 Pr{d}
# diodes
as # diodes  ,
# good
 Pr{g}
# diodes
as # diodes  .
Here we abbreviate defective and good by d and g and "the probability that a diode is defective" by Pr{d}
and similarly for good. Since the relation (1.2) involves taking the limit as # diodes  , it looks like we
may not be able to determine Pr{d} and Pr{g} exactly. We can estimate these values by the ratios
# defective
# good
and
# diodes
# diodes
using a large number of diodes. In fact, the values 0.003 and 0.997 used for Pr{d} and Pr{g} in (1.1) were
probably obtained in this fashion. This is typical of real world probability models where we must use
estimated values for the presumed underlying probabilities.
Example 1.2. Suppose we roll a six sided die, and we say the following.
The probability, Pr{ 1 }, of getting a 1 is 1/6,
The probability, Pr{ 2 }, of getting a 2 is 1/6,
....
The probability, Pr{ 6 }, of getting a 6 is 1/6.
3.1 - 1
Using a frequency interpretation of probability similar to example 1, if we roll the die a large number of
times, then we expect that the proportion of times it comes up 1 will be approximately 1/6, and similarly for
2, 3, ..., 6. For example, if we roll the die 600 times, then we expect to get approximately 100 1's. More
precisely,
# times a 1 comes up
 Pr{ 1 }
# rolls
as # rolls  ,
and similarly for 2, 3, ..., 6.
Example 1.3. Each day a newsstand buys and sells The Wall Street Journal. Based on records for the
past month they feel that they would never sell more than 4 copies in any day. Furthermore they feel
that
The probability, Pr{0}, of selling zero copies in a given day = 0.21,
The probability, Pr{1}, of selling one copy in a given day = 0.26,
(1.3)
Pr{2} = 0.32,
Pr{3} = 0.16,
Pr{4} = 0.05,
Here we are assuming that the newsstand has enough copies of The Wall Street Journal in stock to satisfy
the demand of all the customers who want to buy it in a given day. If we look at the number of copies of
The Wall Street Journal sold each day for a number of consecutive days, then the proportion of times none
are sold is approximately 0.21, and similarly for 1, 2, 3, 4. More precisely
(1.4)
# days none are sold
 Pr{ 0 }
# days
as # days  ,
and similarly for 1, 2, 3, 4.
Summary and Terminology: We observe something. This is called an experiment in many books. For
example, rolling a die and observing the face that comes up is an experiment. When we do an experiment
there are certain possible things we may observe. These are called outcomes. For example, when we roll a
die, the outcomes are 1, 2, 3, 4, 5, and 6. The set of all possible outcomes is called the sample space. Often
we will denote the sample space by S. ( is another symbol used for the sample space in many texts.)
S = {d, g} in Example 1.1, S = {1, 2, 3, 4, 5, 6} in Example 1.2 and S = {1, 2, 3, 4} in Example 1.3. For
each outcome a in S one has the probability of a occurring Pr{a}. Intuitively, it is the proportion of times
the outcome occurs if we repeat the same experiment independently many times, i.e.
(1.5)
Pr{ a } = lim
n
# times a occurs in n repetitions
n
Remarks: Even though this description of probability has a nice intuitive appeal, it has some problems
if one tries to use it as a definition. One reason for this is because the notion of repeating the same
1.1 - 2
experiment independently a number of times has not been defined. If we flip a coin or roll a die
repeatedly, then this is pretty close to independent repetitions of the same experiment. However if we
observe the number of newspapers sold each day for a number of days, then this is probably not
independent repetitions of the same experiment. Despite this problem, let us proceed on using the
above concept of probability as a guide to our thoughts.
In the two examples above, there were only a finite number of outcomes, i.e. S = {a1, a2, ..., an}. In
this case formula (1.5) would imply
n
(1.6)

Pr{ai} ≥ 0 for each i and
Pr{ai} = 1.
i=1
If we are going to use estimated or assumed values for the actual probabilities, then we should make
sure that they satisfy (1.6). This is true in examples 1.1, 1.2 and 1.3. The set of values
Pr{a1}, ..., Pr{an} is called the probability distribution for the outcomes of the experiment. We can
 p1 
represent a probability distribution by the row vector p = (p1,…, pn) or column vector p =   where
 pn 
pi = Pr{ai}. Later we shall see some situations where it might be advantageous to use a row vector
instead of a column vector or vice versa. For example, in Example 1.3 one has
p = (0.21, 0.26, 0.32, 0.16, 0.05). The components of such a vector p should be nonnegative numbers
that sum to one. A vector with this property is called a probability vector.
Often it is informative to make a graph of the probability distribution. Here is a graph of the
probability distribution in Example 1.3.
0.3
0.25
0.2
0.15
0.1
0.05
1
2
3
4
Example 1.4. Suppose in the context of Example 1.1 we look at two diodes and observe whether each
one is defective or good. An outcome consists of observing the condition of the first diode together
with the condition of the second diode. For example, the first diode might be good and the second
might be defective which we abbreviate by gd. Thus there are 4 possible outcomes for the two diodes,
namely gg, gd, dg and dd. The sample space S = {gg, gd, dg, dd}. Suppose after some testing we
arrive at the following estimates of the probabilities of these outcomes.
Pr{ gg } = 0.9943
Pr{ gd } = 0.0027
1.1 - 3
Pr{ dg } = 0.0027
Pr{ dd } = 0.0003
We might still be interested in questions like "What is the probability the first diode is good?" or "What is
the probability the second diode is defective?". These questions can be viewed as asking if the outcome of
an experiment belongs to a certain set. For example, the first diode being good corresponds to the outcome
belonging to the set E = {gg, gd}, while the second diode being defective corresponds to the outcome
belonging to the set F = {gd, dd}. In probability theory, a set of outcomes is called an event.
We are interested in probabilities of events as well as probabilities of individual outcomes. Using the
intuitive description of probability above, we would say that the probability of a certain event E is the
proportion of times the outcome lies in the set E if we repeat the same experiment independently many
times, i. e.
lim # times outcome is in E in n independent repetitions
Pr{ E } = n 
n
(1.7)
If E = {b1, b2, ..., bm} is a finite set then it would follow from (1.5) and (1.7) that the probability of the
event E is the sum of the probabilities of the outcomes in E, i.e.
m
(1.8)
Pr{E} =

Pr{bi}
i=1
In Example 1.4, to find the probability that the first diode is good we use E = {gg, gd} and get
Pr{E} = Pr{ {gg, gd} } = Pr{gg} + Pr{gd} = 0.9943 + 0.0027 = 0.97.
Similarly
Pr{ first diode is defective } = Pr{dg} + Pr{dd} = 0.0027 + 0.0003 = 0.003.
Pr{ second diode is good } = Pr{gg} + Pr{dg} = 0.997.
Pr{ second diode is defective } = Pr{gd} + Pr{dd} = 0.003.
Example 1.5. Consider the situation in example 3. Suppose the newsstand decides to stock 2 copies of
The Wall Street Journal on a certain day. What is the probability that they will have enough copies to
satisfy all the customers that want to buy one? We are asking for the probability of the event E = {0, 1,
2}. This would be Pr{0} + Pr{1} + Pr{2} = 0.21 + 0.26 + 0.32 = 0.79.
Additivity. Formula (1.8) is actually a special case of the following more general formula. Suppose
E1, E2, ..., Em are disjoint events, i.e. no two of the sets Ei and Ej have any elements in common.
Furthermore, let E denote the union of the sets E1, ..., Em, i.e. the set of all outcomes which are in one of the
sets E1, ..., Em. Then
1.1 - 4
m
(1.9)

Pr{E} =
Pr{Ei}
i=1
This is called the finite additivity property of probability.
Example 1.6. Suppose in the context of Example 1.2 we roll a die. Let A = {2, 4, 6} be the event of
rolling an even number and B = {1, 3} be the event of rolling a 1 or a 3. Note that A and B are disjoint
and AB = {1, 2, 3, 4, 6} is the event of rolling anything except a 5. We have Pr{A} = 1/2,
Pr{B} = 1/3, and Pr{AB} = 5/6. The fact that Pr{AB} = Pr{A} + Pr{B} is a special case of the
above formula.
There are three other properties of probability which would follow from our informal definition.
(1.10)
0  Pr{E}  1,
(1.11)
Pr{S} = 1,
for any event E,
Pr{  } = 0
where S is the entire sample space and  is the empty set. Formula (1.10) is called the non-negativity
property of probability and formula (1.11) is sometimes called the normalization axiom.
Problem 1.1. Office Max keeps a certain number of staplers on hand. If they sell out on a certain day,
they order 6 more from the distributor and these are delivered in time for the start of the next day. Thus
the inventory at the start of a day can be 1, 2, 3, 4, 5, or 6. Based on records for the past two months
they feel that the probability, Pr{1}, of there being 1 stapler at the start of a day = 0.09, Pr{2} = 0.21,
Pr{3} = 0.29, Pr{4} = 0.23, Pr{5} = 0.12 and Pr{6} = 0.06.
(a) What is the sample space S?
Ans: S = {1, 2, 3, 4, 5, 6}
(b) Does this model satisfy condition (1.4)?
Ans: Yes.
(c) What is the probability that there is at least 3 staplers at the start of the day? What event are we
talking about?
Ans: The event is E = {3, 4, 5, 6} and Pr{E} = 0.7.
Problem 1.2. Using (1.7) show that Pr{AB} = Pr{A} + Pr{B} - Pr{AB} when A and B are not
necessarily disjoint.
3.1.2 Random variables.
Let's return to Example 1.4 where we look at two diodes and observe whether each one is defective (d) or
good (g). Previously we described that situation by considering the sample space S = {gg, gd, dg, dd}.
There is a more common way of describing this situation instead of explicitly listing the elements of the
sample space. This involves what are called random variables. Superficially we choose letters for the result
of each of the individual diodes. For example, we might let
X1 = result of first transisitor,
X2 = result of second diode.
1.1 - 5
Thus
X1(gg) = g
X2(gg) = g
X1(gd) = g
X2(gd) = d
X1(dg) = d
X2(dg) = g
X1(dd) = d
X2(dd) = d
Note that gd is just a short way of writing the order pair (g, d) and similarly for gg, dg and dd. With that in
mind, note that X1(i,j) = i and X2(i,j) = j where i and j represent either g or d. In fact, the outcome (i, j) of
the experiment is determined by the values of X1 and X2 since (i, j) = (X1(i,j), X2(i,j)).
X1 and X2 are examples of random variables. In general, if we have an experiment with a sample space S,
then a random variable X is a function defined on S. Thus a random variable X is a function which assigns
to each outcome a a value X(a). For example X1 assigns to the outcome gd the value g. Many probability
questions are stated most easily using random variables.
The outcome of an experiment itself is a random variable. It corresponds to the function which assigns to
each outcome a value equal to the outcome itself. So, in a sense, the notion of a random variable includes
that of an experiment.
More generally, there is often a collection of random variables whose values taken together specify the
outcome of the experiment. As noted in the example above the outcome of the experiment is just
(X1(i,j), X2(i,j)). Often books simply refer to the experiment by giving one or more random variables
without specifically mentioning what the actual outcomes are.
The following notation is quite common. If X is a random variable and x is a value, then one can consider
the set of outcomes to which X assigns the value x. This set of outcomes could be denoted by {a: X(a) = x},
but more commonly it is denoted simply by {X = x}. The probability of this event is usually denoted by
Pr{X = x} instead of the more lengthy Pr{a: X(a) = x}. It is the probability that the random variable
assumes the value x. Thus, the probability the first diode is good is denoted by Pr{X1 = g}. In the
discussion of Example 1.4 above we saw above that Pr{X1 = g} = 0.997. Most books tend to use capital
letters for random variables and lower case letters for the values they may assume.
Example 1.7 (Two rolls of a die). We roll a die twice (or roll two dice together). The outcome
consists of observing the result of the first roll together with the result of the second roll. For example,
we might get a 3 on the first roll and a 5 on the second. Let us indicate this by writing (3, 5). Thus the
sample space consists of the following 36 possible outcomes for the two rolls.
(1,1)
(1,2)
(1,3)
(1,4)
(1,5)
(1,6)
(2,1)
(2,2)
(2,3)
(2,4)
(2,5)
(2,6)
(3,1)
(3,2)
(3,3)
(3,4)
(3,5)
(3,6)
1.1 - 6
(4,1)
(4,2)
(4,3)
(4,4)
(4,5)
(4,6)
(5,1)
(5,2)
(5,3)
(5,4)
(5,5)
(5,6)
(6,1)
(6,2)
(6,3)
(6,4)
(6,5)
(6,6)
Instead of listing these 36 elements in the sample space we could equally well describe the experiment by
the defining the two random variables
X1 = result of first roll,
X2 = result of second roll.
For example X1(3,4) = 3 and X2(3,4) = 4. In general X1(i,j) = i and X2(i,j) = j. As with the example of the
two diodes, the outcome (i, j) of the experiment is determined by the values of X1 and X2 since
(i, j) = (X1(i,j), X2(i,j)). Suppose, based on some testing, we feel that the probability of each outcome is
1/36, i.e. the outcomes are equally likely. The probability the first roll is a 3 is
Pr{X1 = 3} = Pr{ (3, 1) } + Pr{ (3, 2) } + Pr{ (3, 2) } + Pr{ (3, 4) } + Pr{ (3, 5) } + Pr{ (3, 6) }
1
1
1
1
1
1
1
= 36 + 36 + 36 + 36 + 36 + 36 = 6
In the same way one can show that Pr{X1 = i} = 1/6 and Pr{X2 = j} = 1/6 for all i and j.
More generally, if A is a set, then one can consider those outcomes for which X assumes a value in A. This
set, {a: X(a)  A}, is usually denoted by {X  A}, and its probability is denoted by Pr{X  A}. For
example, in Example 1.7 if A = {2, 3, 4}, then {X1  {2, 3, 4} } is the event that the first roll is a 2, 3, or 4
and Pr{X1  {2, 3, 4} } = ½.
One can think of a random variable as defining an experiment with less detail than the original experiment,
i.e. we temporarily ignore all aspects of the original experiment except the value of the random variable.
Example 1.8. In the context of Example 1.6 where we roll a die twice suppose we have a situation
where the sum of the two rolls is important. For example, we are playing a game where we roll two
dice, find the sum, and move our piece that many squares forward. Let T be the sum of the two rolls. T
is another random variable and one has T = X1 + X2. We are interested in questions like "What is the
probability that T is 7?". More generally we want to know the probability that T is n, i.e. Pr{ T = n }.
The sum being 7 is just the event consiting of the outcomes (1,6), (2,5), (3,4), (4,3), (5,2), and (6,1).
Therefore Pr{ T = 7 } = 6/36 = 1/6. In this fashion we can compute Pr{ T = n } for any n. In fact, we
get Pr{ T = n } = f(n), where
n
2
3
4
5
6
7
8
9
10
11
12
__________________________________________________________
1
2
3
4
5
6
5
4
3
2
1
f(n)
36
36
36
36
36
36
36
36
36
36
36
1.1 - 7
Probability mass functions. Suppose we have a random variable X that takes on the set of values
{x1, x2, ..., xn}. Let f(xi) = Pr{ X = xi } be the probability that X takes on the value xi for each i. This
function f(x) is called the probability mass function of the random variable. Another way to represent the
probability mass function is by means of a probability vector p = (p1,…, pn) where pi = Pr{ X = xi }. For
example, the table of values of n and f(n) above is the probability distribution of the random variable T.
Alternatively one could say that f(n) = (6 - |n – 7|)/36 for n = 2, 3, …, 12. Here is a graph of
f(n) = (6 - |n - 7|)/36.
0.15
0.125
0.1
0.075
0.05
0.025
2
4
6
8
10
12
Example 1.8. Suppose in the context of Example 1.1 we look at three diodes. There are eight possible
outcomes for the three diodes, i.e. ggg, ggd, gdg, gdd, dgg, dgd, ddg and ddd. For example,
gdg means that the first and third diodes selected are acceptable and the second is defective. Based on
past exerience he assigns the following probabilities to each of the outcomes.
Pr{ ggg } = (0.999)3
Pr{ ggd } = Pr{ gdg } = Pr{ dgg} = (0.999)2(0.001)
Pr{ gdd } = Pr{ dgd } = Pr{ ddg} = (0.999)(0.001)2
Pr{ ddd } = (0.001)3
Let
X1 = condition of diode on the first day, either g or d,
X2 = condition of diode on the second day,
X3 = condition of diode on the third day,
N = number of diodes in the batch of three that are defective.
For example X1(dgd) = d, X(dgd) = g, X3(dgd) = d, and N(dgd) = 2. In general X1(i1,i2,i3) = i1,
X2(i1,i2,i3) = i2, X3(i1,i2,i3) = i3, and N(i1,i2,i3) = the number of values of j such that ij is equal to d. The
outcome (i1,i2,i3) of the experiment is determined by the values of X1, X2 and X3 since
(i1,i2,i3) = ( X1(i1,i2,i3), X2(i1,i2,i3), X3(i1,i2,i3) ).
Note: If we were to denote acceptable by 0 and defective by 1 then N would be related to X1, X2 and
X3 by N = X1 + X2 + X3.
1.1 - 8
Problem 1.3. Find the probability mass functions of X1, X2, X3 and N.
Example 1.9. The administrative assistant of the accounting department is going to observe the
condition of the department's copier at the start of each day on two successive days. He will record
whether it is either in good condition, g, poor condition, p, or broken, b, for each of the two days.
Thus, there are nine outcomes for his observations over the two day period, i.e. gg, gp, gb, pg,
pp, pb, bg, bp and bb. For example, gp means that the copier is in good condition the first day
and in poor condition the second day. Let
X1 = state of the copier on the first day, either g, p or b,
X2 = state of the copier on the second day,
For example X1(gp) = g and X2(gp) = p. In general X1(i1,i2) = i1, and X2(i1,i2) = i2. The outcome (i1,i2)
of the experiment is determined by the values of X1 and X2 since (i1,i2) = ( X1(i1,i2), X2(i1,i2) ).
3.1.3 Conditional probability and independence.
Conditional probability. Conditional probabilities are a way to adjust the probability of something
happening according to how much information one has.
Example 1.10. Suppose in the context of Example 1.4 take two diode. Using the probabilities in that
example, we saw that the probability that the second diode is defective is 0.003. However, suppose we
test the first diode and we find that it is defective. Does this affect the probability that the second diode
is defective?
Here is one possible way of looking at this. We take a large number of pairs of diodes and test both diodes
in each pair. We ignore any pairs where the first diode is good. Of those which have the first diode
defective, we count the number in which the second diode is also defective. The fraction
(3.1)
# which both are defective
# which the first is defective
is approximately the probability that the second diode is defective after we observe that the first diode is
defective. Note that
(3.2)
# which both are defective
total # of pairs
# which both are defective
=
# which the first is defective
# which the first is defective
total # of pairs

Pr{both are defective}
0.0003
=
= 0.1,
Pr{first is defective}
0.003
as the number of rolls  . Thus it appears that the observation that the first diode is defective does affect
the probability that the second diode is defective.
This is an example of conditional probabilities. If A and B are two events, then we are interested in the
probability that the outcome is in A given that the outcome is in B. This is called the conditional
1.1 - 9
probability of A given B and is denoted by Pr{A | B}. Intuitively, we may interpret this as meaning the
following. We perform the experiment a large number of times with each one independent of the others.
We ignore any times the outcome is not in B, and of the times the outcome is in B, we count the number of
times the outcome is also in A. Then
(3.3)
# times outcome is in A  B
 Pr{ A | B },
# times outcome is in B
as the number of repetitions  .
Note that if we divide the top and bottom of the fraction in (3.3) by the total number of times we do the
experiment, then we get
(3.4)
# times outcome is in A  B
# repetitions
# times outcome is in B
# repetitions
 Pr{ A | B },
The fraction in the top of (3.4) approaches Pr{A  B } and the fraction in the bottom approaches Pr{B} as
the number of times we do the experiment  .. So we have
(3.5)
Pr{A | B} =
Pr{A  B}
,
Pr{B}
provided Pr{B} is not 0. Most texts take this as the definition of Pr{A | B}. In Example 1.10
Pr{ the second diode is defective | the first is defective } = 0.1.
Example 1.11. Suppose in the context of Example 1.2 we roll a die. What is the conditional
probability that the number coming up is even given that the number coming up is 4 or larger?
Here we are asking for Pr{A | B) where A = = {2, 4, 6} is the event that we get an even number and
B = {4, 5, 6} is the event that the number is 4 or larger. Using (3.5) we
Pr{A  B} 1/3
have Pr{A | B} =
=
= 2/3.
Pr{B}
1/2
Problem 1.4. You own stock in Megabyte Computer Corporation. You estimate that there is an 80%
chance of Megabyte making a profit if it makes a certain technological breakthrough, but only a 30%
chance of making a profit if they don't make the breakthrough. Furthermore, you estimate that there is
a 40% chance of its making the breakthrough. Suppose before you can find out if they made the
breakthrough, you go on a 6 month vacation to Tahiti. Then one day you receive the following
message from your stockbroker. "Megabyte made a profit." What is the probability that Megabyte
made the breakthrough?
Ans: .64
Independent events. Earlier we used the term independent in an informal way to indicate repetitions of an
experiment which are in some way unrelated. In most books the notion of independence is defined in terms
of probability instead of vice versa. This is done as follows.
1.1 - 10
We say than an event A is independent of another event B if the probability that the outcome is in A is the
same as the probability that the outcome is in A given that the outcome lies in B. In symbols
(3.5)
Pr{A | B} = Pr{A}.
Thus the knowledge that B has occurred doesn't give one any information regarding the probability that A
has occurred.
Example 1.12. In Example 1.4 where we two took diodes the second diode being defective is not
independent of the first diode being defective. This is because
Pr{ the second diode is defective | the first is defective } = 0.1 (as we saw above) but
Pr{ the second diode is defective } = 0.003 (as we saw in connection with Example 1.4).
Example 1.13. In Example 5 getting an even number is not independent of getting 4 or larger since
Pr{even | 4 or larger } = 2/3, while Pr{even} = 1/2. On the other hand if C = {3, 4, 5, 6} is the event
of getting 3 or larger then it is not hard to show that Pr{A | C} = 1/2 so that getting an even number is
independent of getting 3 or larger.
Since Pr{A | B} = Pr{A  B}/Pr{B}, the formula (1.2.7) that A be independent of B can be stated as
Pr{A  B} = Pr{A} Pr{B}.
(3.6)
Thus, A is independent of B if the probability of both A and B occurring is the product of their probabilities.
One consequence of this is that if A is independent of B, then B is independent of A, i.e. the definition is
symmetric in the two sets A and B.
The formula (3.6) makes sense even if Pr{B} is 0, while the original formula (3.5) does not. Most texts use
(3.6) as the basic definition of independent events.
Problem 1.5. a) You roll a die twice as in the above example. Consider the event where the sum of
the numbers on the two rolls is 7. Is this independent of rolling a 1 on the first roll?
Ans: Yes.
b) Let B be the event of rolling a 1 on the first roll or second roll or both. Is event where the sum of
the numbers on the two rolls is 7 independent of B?
Ans: No.
Independent random variables: two random variables. In Example 1.12 we saw that the second diode
being defective is not independent of the first diode being defective. However, in Example 1.7 it is not hard
to see that rolling a 3 on the second roll is independent of rolling a 5 on the first roll, i.e. the fact that we
know that a 5 turned up on the first roll doesn't change the probability that a 3 will come up on the second
roll. Let {X1 = 5} denote the event that the result of the first roll is a 5 and {X2 = 3} be the event that the
result of the second roll is a 3 and {X1 = 5, X2 = 3} be the event that the first roll is a 5 and the second roll
is a 3. One has
{X2 = 3} = { (1,3), (2,3), (3,3), (4,3), (5,3), (6,3) }
1.1 - 11
{X1 = 5} = { (5,1), (5,2), (5,3), (5,4), (5,5), (5,6) }
{X1 = 5, X2 = 3} = { (5,3) }.
Therefore, Pr{X2 = 3} = 1/6, Pr{X1 = 5} = 1/6, Pr{X1 = 5, X2 = 3} = 1/36. So,
Pr{X2 = 3 | X1 = 5} =
1/36
= 1/6, which is the same as Pr{X2 = 3}. This shows rolling a 3 on the second
1/6
roll is independent of rolling a 5 on the first roll.
The same argument shows that the probability of rolling any given number on the second roll is independent
of rolling any other given number on the first roll, i.e.
(3.7)
Pr{X1 = i, X2 = j} = Pr{X1 = i} Pr{X2 = j}.
This leads us to the notion of independent random variables. Let X and Y be two random variables. X and
Y are independent if knowledge of the values of one of them doesn't influence the probability that the other
assumes various values. If X and Y only take on a finite or countably infinite number of values and have
probability mass functions f(x) and g(y) respectively, then this is equivalent to
(3.8)
Pr{X = x, Y = y} = Pr{X = x} Pr{Y = y} = f(x) g(y)
for any x and y. For example, X1 and X2 in Example 1.7 are independent. However, if X1 and X2 are the
condition of the first and second diode in Example 1.4, then they are not independent.
Example 1.14. Consider the situation in Example 1.4 where we two took diodes. Let X1 and X2 be
the condition of the first and second diode. Suppose the probability mass functions of are given by
f(g) = Pr{ X1 = g } = 0.997
f(d) = Pr{ X1 = d } = 0.003
h(g) = Pr{ X2 = g } = 0.997
h(d) = Pr{ X2 = d } = 0.003
Furthermore, suppose X1 and X2 are independent. Find the probabilities of the four outcomes gg, gd,
dg and dd. Using the independence one has
Pr{ gg } = Pr{ X1 = g and X2 = g } = Pr{ X1 = g } Pr{ X2 = g } = (0.997)(0.997) = 0.994009
Pr{ gd } = Pr{ X1 = g and X2 = d } = Pr{ X1 = g } Pr{ X2 = d } = (0.997)(0.003) = 0.002991
Pr{ dg } = Pr{ X1 = d and X2 = g } = Pr{ X1 = d } Pr{ X2 = g } = (0.003)(0.997) = 0.002991
Pr{ dd } = Pr{ X1 = d and X2 = d } = Pr{ X1 = d } Pr{ X2 = d } = (0.003)(0.003) = 0.000009
This illustrates a common situation where we describe a probability model by giving one or more
random variables along with their probability mass functions together with some information, such as
independence, about the joint behavior of the random variables. We do this instead of listing the
elements in the sample space along with their probabilities.
Independent random variables: several random variables. More generally, if we have a collection of
random variables, X1, ..., Xn, then they are independent if knowledge of the values of some of the variables
1.1 - 12
doesn't change the probability that the others assume various values. If they only take on a finite or
countably infinite number of values and have probability mass functions fi(x), then this is equivalent to, i.e.
Pr{ X1 = x1, ..., Xn = xn } = Pr{X1 = x1} ... Pr{Xn = xn} = f1(x1) ... fn(xn)
(3.9)
for any x1, ..., xn.
Example 1.15. Consider a situation similar to Example 1.4, except we take n diodes. Let Xi be the
condition of the ith diode for i = 1, 2, …, n. Suppose all the Xi have the same probability mass function
given by
fi(g) = Pr{ Xi = g } = 0.997
fi(d) = Pr{ Xi = d } = 0.003
Furthermore, suppose all the Xi are independent. Find the probability that all n diodes are good. Using
the independence one has
Pr{ gg…g } = Pr{ X1 = g and X2 = g and … and Xn = g }
= Pr{ X1 = g } Pr{ X2 = g } … Pr{ Xn = g }
= (0.997) (0.997) … (0.997) = (0.997)n
Problem 1.6. A store sells two types of tables: plain and deluxe. When a customer buys a table, there is
an 80% chance that it will be a plain table. Assume that whether or not the type of table that one
customer buys is independent of the types of the tables that any other customer buys.
a. On a certain day five tables are sold. What is the probability that they will all be plain?
b.
Suppose that on each of the days Monday, Tuesday, and Wednesday five tables are sold. What is
the probability that all the tables sold on Monday and Tuesday are plain and that on Wednesday at
least one deluxe table is sold?
1.1 - 13
Download