Primer on Probability Trees - The University of Texas at Arlington

advertisement
Primer on Probability Trees
Basic definitions
Probability is a number between zero and one that tells us the likelihood of an event (near 0
means very unlikely, near 1 means likely).
A trial is an operation whose outcome is uncertain, such as tossing a coin.
An event is one of the outcomes that can occur when a trial occurs, such as tossing a coin and
getting heads.
If we use the letter E denote an event (such as tossing a coin and getting heads), then the
probability of the event’s occurrence is
Pr{E}
There are two ways to assign a specific number to this probability
1. List all the events that can occur when a trial happens, such as heads and tails when
tossing a coin. Then, decide logically what the probability of each event is based on the
nature of the trials and events. When tossing a coin, the probability of getting heads is
0.5, and the probability of getting tails is also 0.5, because both outcomes are equally
likely.
2. Think about, or do an experiment, where the same trial is repeated for a very large
number of times, approaching infinity. The probability of event E is the number of times
it occurs, divided by the number of trials. For example, if we tossed a coin a billion
billion times, we would expect the proportion of times that we got heads to be very close
to 0.5.
A probability tree represents the events that can happen during one or more trials. Each trial is
represented by a layer of branches. Each branch corresponds to one possible outcome of the trial,
and each branch is labeled with the probability of the outcome.
To represent the trial of tossing a coin once, the probability tree is relatively simple and has one
layer of branches:
Heads
0.5
0.5
Tails
1
To represent complex process of several trials and their outcomes, we add a layer of branches for
each trial. Here is the probability tree representing the process of tossing a coin twice:
Heads
0.5
Heads
0.5
0.5
Tails
Heads
0.5
0.5
Tails
0.5
Tails
Calculating with probability trees
Probability trees can be used to identify all the outcomes that can occur during a complex
process of multiple trials, and calculate their probabilities.
Each pathway along the branches through a tree corresponds to one possible outcome of the
complex process. Here there are four possibilities:
Heads, Heads
Heads, Tails
Tails, Heads
Tails, Tails
For each possible outcome corresponding to one pathway through the branches, the probability
that it occurs is the product of all the probabilities along the branches. So the probabilities of the
four outcomes we identified are:
Heads, Heads
probability = 0.25
Heads, Tails
probability = 0.25
Tails, Heads
probability = 0.25
Tails, Tails
probability = 0.25
Once we have defined all possible outcomes and their probabilities, we can calculate the
probabilities of more complex events that combine the possible outcomes in different ways.
The probability of a complex event is the sum of the probabilities of the simpler events identified
in the tree.
Example: What is the probability of getting different sides of the coin when it is tossed twice?
Answer: This can happen two ways, according to the tree. Either Heads, then Tails, or Tails then
Heads. Each of these two outcomes has probability 0.25. So the sum is 0.5, and this is the
probability of getting two different sides when tossing a coin twice.
2
Example: What is the probability of getting the same side of the coin both times when it is tossed
twice?
Answer: Again, this can happen two ways, according to the tree. Either Heads occurs both times,
or Tails occurs both times. Each of these two outcomes has probability 0.25. So the sum is 0.5,
and this is the probability of getting the same side both times when tossing a coin twice.
Example:
When we sample individuals at random from a population that contains a proportion p of
individuals of one type and a proportion q of individuals of another type, the probability of
getting an individual of the first type is p, and of getting an individual of the second type is q.
Suppose we sample individuals from a population of fruit flies where 30% are black and 70% are
gray. Here is the probability tree for sampling one individual:
Black
0.3
0.7
Gray
We want to know the probability of getting two flies of the same color when we sample two flies
at random.
We calculate this by making a tree with two layers:
Black
0.3
Black
0.3
0.7
Gray
Black
0.3
0.7
Gray
0.7
Gray
Our complex event is that we sample the two flies and they are the same color. There are two
ways that this can happen, either both flies are black or both are gray.
So we have to work along the paths corresponding to each of these possibilities, and calculate
their probabilities by multiplying together the probabilities on the branches.
3
Path 1 – 2 black flies
0.3 X 0.3 = 0.09
Path 2 – 2 gray flies
0.7 X 0.7 = 0.49
Either of these outcomes satisfies the definition of our event of getting two flies of the same
color, so now we add together these probabilities.
Pr{2 flies same color} = 0.09 + 0.49 = 0.58
To summarize, probability trees allow us to break a complex event down into simpler events and
combine probabilities following two rules:
1. The probability of an outcome corresponding to one path is the product of the probabilities
along the branches.
2. The probability of an event with more than one path is the sum of the probabilities of the paths
involved.
Additional rules about calculating with probabilities:
1. The probability of an event is always between 0 and 1.
0  Pr{E}  1
A probability of zero means an event never happens, and a probability of one means it always
happens.
2. The sum of the probabilities of all possible events is one.
If the set of all possible events is E1, E2, …, Ek then
k
 Pr{E }  1
i 1
i
For example, on the probability trees we just looked at, when we get to the end of the four
branches, we get to all possible events. You can check and see that the probabilities all add up to
one.
3. The probability that an event E does not happen is one minus the probability of the event.
4
If we denote “the event E does not happen” symbolically as EC, then
Pr{E C }  1  Pr{E}
The notation here uses the superscript C to mean “complement”, in the sense that the
complement of the event “E happens” is the event “E does not happen”.
By themselves, these rules are too basic to do all the calculations we might want to do with
probabilities, so we need some more rules. In particular we need rules to deal with compound
events, meaning complex events made up of simpler events.
When we combine two events to make a more complex event, an important consideration is
whether the events are disjoint, or not.
Disjoint means the two events cannot happen simultaneously, by definition.
Consider the population of fruit flies where some are black and some are gray. Here are some
examples of disjoint events:
E1 = sample two flies and get both the same color
E2 = sample two flies and get two different colors
E1 = sample two flies and get two black flies
E2 = sample two flies and get two gray flies
If two can happen simultaneously then they are not disjoint. Here are some examples from the
fly population again:
E1 = sample two flies and get both the same color
E2 = sample two flies and get two males
E1 = sample two flies and get both the same color
E2 = sample two flies and get at least one black fly
Events that are disjoint may or may not be likely to occur together. But it is logically possible for
them both to occur, given the way they are defined.
Now we can introduce two more rules for situations where we have events E1 and E2 and we
want to think about the event
Either E1 or E2 happens
Just to be clear, when we say “or” we mean that this complex event happens when E1 happens,
when E2 happens, or when they both happen.
4. If two events E1 and E2 are disjoint, then
5
Pr{E1 or E2} = Pr{E1} + Pr{E2}
5. If two events E1 and E2 are not disjoint, then
Pr{E1 or E2} = Pr{E1} + Pr{E2} – Pr{E1 and E2}
Here the statement “E1 and E2” means the complex event that both E1 and E2 happen, not just
one or the other.
For events that are not disjoint, it is logically possible for both of them to happen so that
Pr{E1 and E2} > 0
But for disjoint events, they can’t both happen, so
Pr{E1 and E2} = 0
Therefore, rule 4 is really just a special case of rule 5, which applies to all events, and simplifies
to rule 4 when the two events are disjoint.
For events that are not disjoint, they could both happen, and we might want to do calculations
with the probability that both happen.
To deal with this, we need to consider one more way that two events can be related. We have
already dealt with disjoint and non-disjoint events.
Now we have to talk about independent and dependent events.
Two events E1 and E2 are independent if the probability that E2 happens is unrelated to whether
E1 has happened and vice-versa.
The two events are dependent if the probability of E2 changes if E1 has happened.
Very often, when events are dependent, some kind of cause and effect relationship is going on
underneath these events. For example, when event E1 happens it could change things directly so
that event E2 becomes more likely. Or perhaps some third event that we don’t know about has
happened, and it makes both E1 and E2 more likely. That would make the probability of E2
related to the probability of E1.
When events are dependent the concept of conditional probability becomes important.
Conditional probability is the probability that one event happens, given that we know another
event has happened. We write it like this:
Pr{E2|E1}
6
We read this notation as the “probability that E2 happens, given that E1 has happened”, or simply
as the “probability that E2 happens given E1.”
Note that when we write this expression, it indicates that we know or assume that E1 has
happened, and we are regarding E2 as uncertain. The order that E2 and E1 appear in this
expression does not depend on their order in time, or which events might be “cause” and
“effect”.
The event we regard as uncertain is written to the right of the vertical line, and the one we know
or assume has happened is written to the left.
The conditional probability can be contrasted with the simple probability
Pr{E2}
This is just the overall probability that event E2 happens, without any information on the event
E1.
When two events are dependent
Pr{E2|E1} ≠ Pr{E2}
When two events are independent
Pr{E2|E1} = Pr{E2}
Mathematically, the conditional probability of E2 given E1 is defined as
Pr{E2 | E1} 
Pr{E2 and E1}
Pr{E1}
Example: We are interested in hair and eye color in a group of men living in Germany, and have
data on 2190 men in our group:
Brown Eyes
Blue Eyes
Dark Hair
300
200
Light Hair
420
1270
We are interested in
E1 = a man has dark hair
E2 = a man has blue eyes
We are interested in the conditional probability that a man has blue eyes, given that he has dark
hair.
7
Pr{E2|E1}
In the sample for this example, 500 men have dark hair and 1470 have blue eyes; 200 men have
both dark hair and blue eyes.
From these data we apply the principle that the probability of an individual having a particular
characteristic is the proportion of individuals who have that characteristic in the larger
population:
Pr{E1} = 500 / 2190 = 0.2283
Pr{E2} = 1470 / 2190 = 0.6712
Pr{E1 and E2} = 200 / 2190 = 0.0913
Now, from the definition of conditional probability we can calculate
Pr{E2|E1} = Pr{E1 and E2} / Pr{E1}
= 0.0913 / 0.2283 = 0.4000
Now compare this to the simple probability that a man has blue eyes, which is Pr{E2}.
Pr{E2} = 0.6712 > Pr{E2|E1} = 0.4000
Therefore, having blue eyes depends on hair color, and in this case having dark hair makes it less
likely that a man will also have blue eyes.
In fact, this result is fairly typical. Not every human population would have exactly the same
numbers as this group of German men, but genes for hair and eye color are related in such a way
that if someone has dark hair, they are also likely to have a gene that produces brown eyes
instead of blue eyes.
This means that these two traits are not independent, and blue-eyed, dark-haired people tend to
be uncommon in human populations.
Now that we know what conditional probability is, we can look at two more rules for taking the
probabilities of simple events and combining them into the probabilities of more complex events.
6. If two events E1 and E2 are independent then
Pr{E1 and E2} = Pr{E1} X Pr{E2}
7. If two events E1 and E2 are not independent then
8
Pr{E1 and E2} = Pr{E1} X Pr{E2|E1}
Conditional probability is an extremely important concept, but it’s also a confusing concept.
There are a lot of situations where people tend to confuse one conditional probability with the
reverse one:
Pr{E2|E1} versus Pr{E1|E2}
These numbers are not usually equal, and just because one is large or small does not make the
other one large or small.
Application of conditional probability and probability trees to diagnostic medical tests
One application of conditional probability where confusion often arises is in medical testing.
Medical tests are not perfect, so if someone tests positive for a disease, that doesn’t necessarily
mean they have the disease.
Suppose you are tested for a fatal disease, something like cancer or AIDS. How should you react
to a positive test result?
If you get tested for a disease you are interested in this conditional probability
Pr{have disease | positive test}
Medical tests are designed to be good in the sense that the reverse conditional probability should
be high. That is, so that
Pr{test positive | have disease}
This conditional probability is called the sensitivity of the test, and for a good test we want this
to be high, meaning a probability close to 1. For example, we might have
Pr{test positive | have disease} = 0.95
We want a good test to be likely to be positive when someone really has the disease. We also
want it to be likely to be negative when someone does not have the disease.
This aspect involves the conditional probability
Pr{test negative | don’t have disease}
This conditional probability is called the specificity of the test. For a good test, we want the
specificity to be a high probability, so let’s say we have
Pr{test negative | don’t have disease} = 0.90
9
Suppose you are in the position of taking a test for which the sensitivity is 0.95 and the
specificity is 0.90, and you have just tested positive. How certain can you be that you have the
fatal disease?
Let’s analyze this situation by means of a probability tree. We’re going to need one additional
hypothetical assumption to do so. We need to know the probability that a person has the disease,
without considering whether or not they’ve been tested. That is, we need to know the prevalence
of the disease in the population at large.
Let’s assume that this is a moderately uncommon disease, with
Pr{have disease} = 0.08
meaning that 8% of people at large have the disease.
Now let’s construct a probability tree showing all the possibilities for this situation. We’ll start
with the simple probability that a person has the disease based on this prevalence of the disease
in the whole population:
Have disease
0.08
0.92
Don't have
disease
We have to start here because the outcome of the test will depend on whether an individual
actually has the disease.
Next we can add branches involving the test results, and calculate the probabilities of all possible
outcomes:
Test positive, Pr = 0.076
0.95
Have disease
0.08
0.05
Test negative, Pr = 0.004
Test positive, Pr = 0.092
0.1
0.92
Don't have
disease
0.9
Test negative, Pr = 0.828
10
Now let’s get back to the conditional probability that someone is most interested in if they’ve
had a positive result.
Pr{have disease | positive test}
Our definition of conditional probability says that this is equal to
Pr{have disease and positive test} / Pr{positive test}
The probability of both having the disease and getting a positive test result is the uppermost
branch of the table, and we calculated this to be
Pr{have disease and positive test} = 0.076
The probability of a positive test is calculated by noticing that two of the four possible branches
of the tree involve a positive test:
Pr{have disease and positive test}
Pr{don’t have disease and positive test}
The first of these probabilities was just calculated to be 0.076
The second of these probabilities corresponds to the third branch down, and we calculated a
probability of
Pr{don’t have disease and positive test} = 0.092
Notice that these two events leading to a positive test are disjoint (events corresponding to
different pathways through a probability tree are always disjoint), so we can calculate the
probability of a positive test by the addition rule for disjoint events:
Pr{positive test} =
Pr{have disease and positive test} +
Pr{ don’t have disease and positive test}
= 0.092 + 0.076 = 0.168
Now we can finish calculating the conditional probability we want
Pr{have disease | positive test}
= Pr{have disease and positive test} / Pr{positive test}
= 0.076 / 0.168
= 0.452
11
This is rather low. Even though you’ve just had a positive test for a fatal disease, the chances are
less than 50% that you really have the disease.
Is this surprising?
Most people find it surprising, probably for two reasons.
1. There is a natural tendency to confuse two different conditional probabilities
Pr{have disease | positive test} versus Pr{test positive | have disease}
The latter probability is high because the test is a good test, with good sensitivity. But that
doesn’t mean the first probability is high.
2. The prevalence of the disease in the population at large plays a big role in these calculations,
and many people don’t understand this.
Many fatal diseases are uncommon or rare. That is, the prevalence is low, and most people in the
population at large do not have the disease.
Whenever a disease is uncommon or rare, the probability calculation will show that
Pr{have disease | positive test}
is lower than you might expect, meaning usually lower than 50%, and definitely lower than the
high probability associated with the sensitivity of the test.
If a disease is very rare, then
Pr{have disease | positive test}
is likely to be a very small number, something like 0.05.
So, most medical tests are somewhat inconclusive, meaning that a positive test indicates that a
disease might be present, but does not make it certain that a disease is present.
Because of this, medical doctors don’t make a diagnosis based on a single test result. Instead, if a
person tests positive, the doctor follows up with further tests and examinations, to be certain that
the disease is present, before starting a course of treatment.
These issues of conditional probability are also related to the kinds of errors that medical tests
can make.
Diagnostic tests like those we’re talking about can have two kinds of correct results and two
kinds of errors.
12
The correct results are
True positive –someone who really has the disease tests positive.
True negative –someone who really does not have the disease tests negative.
The errors are
False negative – someone who really has the disease tests negative.
False positive – someone who really does not have the disease tests positive.
In constructing our probability tree we found the probabilities for all these possible correct
results and errors:
Test positive, Pr = 0.076, True positive
0.95
Have disease
0.08
0.05
Test negative, Pr = 0.004, False negative
Test positive, Pr = 0.092, False positive
0.1
0.92
Don't have
disease
0.9
Test negative, Pr = 0.828, False negative
Notice that the probability of a false positive is higher than the probability of a true positive. In
medical testing situations where the disease is uncommon in the population at large, the
probability of a false positive is often higher than the probability of a true positive.
Exercises:
1. Researchers are interested in whether wearing seat belts protects against fatal injury in
automobile accidents. They compiled the following data from all people involved in automobile
accidents in the state of Florida:
No Seat Belt
Seat Belt
Fatal
1,601
510
Using these data, calculate
a. The simple probability that a seat belt is not worn.
b. The simple probability that a fatal injury occurs.
13
Not Fatal
162,527
412,368
c. The probability that both a fatal injury occurs and a seat belt is not worn.
d. The conditional probability that a fatal injury occurs, given that a seat belt was not
worn.
e. Is the event of having a fatal injury independent of not wearing a seat belt?
2. In the fall semester of 2007, 52% of the students at UT Arlington were women. Suppose that
two students in the University are selected at random. Using the probability tree method,
calculate:
a. The probability that both students are men.
b. The probability that at least one student is a man.
4. The following figures are typical for situations involving cancer and diagnostic tests. The
prevalence of the cancer is 160 per 100,000 people, that is, the simple probability that an
individual in the population at large has the disease is 0.0016. The diagnostic test for the disease
has a probability of 0.85 of turning out positive for an individual who has the disease. The
diagnostic test has a probability of 0.03 of turning out positive for an individual who does not
have the disease. Calculate the following conditional probabilities:
a. The conditional probability that a person has the disease, given that they have had a
positive test result, Pr{have disease | positive test}.
b. The conditional probability that a person does not have the disease, given that they
have had a negative test result, Pr{don’t have disease | negative test}.
14
Download